Re: [PATCH] MAINTAINERS: powerpc: Remove Aneesh

2024-04-29 Thread Aneesh Kumar K . V
Michael Ellerman  writes:

> Aneesh is stepping down from powerpc maintenance.
>
> Signed-off-by: Michael Ellerman 

Acked-by: Aneesh Kumar K.V (Arm) 

> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7c121493f43d..93af33ee8541 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12478,7 +12478,6 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
>  M:   Michael Ellerman 
>  R:   Nicholas Piggin 
>  R:   Christophe Leroy 
> -R:   Aneesh Kumar K.V 
>  R:   Naveen N. Rao 
>  L:   linuxppc-dev@lists.ozlabs.org
>  S:   Supported
> -- 
> 2.44.0


Re: [PATCH] MAINTAINERS: MMU GATHER: Update Aneesh's address

2024-04-29 Thread Aneesh Kumar K . V
Michael Ellerman  writes:

> Aneesh's IBM address no longer works, switch to his preferred kernel.org
> address.
>
> Signed-off-by: Michael Ellerman 
 
Acked-by: Aneesh Kumar K.V (Arm) 
 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 93af33ee8541..f096c9fff5b3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14898,7 +14898,7 @@ F:drivers/phy/marvell/phy-pxa-usb.c
>  
>  MMU GATHER AND TLB INVALIDATION
>  M:   Will Deacon 
> -M:   "Aneesh Kumar K.V" 
> +M:   "Aneesh Kumar K.V" 
>  M:   Andrew Morton 
>  M:   Nick Piggin 
>  M:   Peter Zijlstra 
> -- 
> 2.44.0


[PATCH] MAINTAINERS: MMU GATHER: Update Aneesh's address

2024-04-29 Thread Michael Ellerman
Aneesh's IBM address no longer works, switch to his preferred kernel.org
address.

Signed-off-by: Michael Ellerman 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 93af33ee8541..f096c9fff5b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14898,7 +14898,7 @@ F:  drivers/phy/marvell/phy-pxa-usb.c
 
 MMU GATHER AND TLB INVALIDATION
 M: Will Deacon 
-M: "Aneesh Kumar K.V" 
+M: "Aneesh Kumar K.V" 
 M: Andrew Morton 
 M: Nick Piggin 
 M: Peter Zijlstra 
-- 
2.44.0



[PATCH] MAINTAINERS: powerpc: Remove Aneesh

2024-04-29 Thread Michael Ellerman
Aneesh is stepping down from powerpc maintenance.

Signed-off-by: Michael Ellerman 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c121493f43d..93af33ee8541 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12478,7 +12478,6 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
 M: Michael Ellerman 
 R: Nicholas Piggin 
 R: Christophe Leroy 
-R: Aneesh Kumar K.V 
 R: Naveen N. Rao 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
-- 
2.44.0



Re: [PATCH v2 0/2] powerpc/pseries: Fixes for lparstat boot reports

2024-04-29 Thread Michael Ellerman
Shrikanth Hegde  writes:
> On 4/12/24 2:50 PM, Shrikanth Hegde wrote:
>> Currently lparstat reports which shows since LPAR boot are wrong for
>> some fields. There is a need for storing the PIC(Pool Idle Count) at
>> boot for accurate reporting. PATCH 1 Does that.
>> 
>> While there, it was noticed that hcall return value is long and both
>> h_get_ppp and h_get_mpp could set the uninitialized values if the hcall
>> fails. PATCH 2 does that.
>> 
>> v1 -> v2:
>> - Nathan pointed out the issues surrounding the h_pic call. Addressed
>> those.
>> - Added a pr_debug if h_pic fails during lparcfg_init
>> - If h_pic fails while reading lparcfg, related files are not exported.
>> - Added failure checks for h_get_mpp, h_get_ppp calls as well.
>> 
>> v1: 
>> https://lore.kernel.org/all/20240405101340.149171-1-sshe...@linux.ibm.com/
>> 
>> Shrikanth Hegde (2):
>>   powerpc/pseries: Add pool idle time at LPAR boot
>>   powerpc/pseries: Add fail related checks for h_get_mpp and h_get_ppp
>> 
>>  arch/powerpc/include/asm/hvcall.h|  2 +-
>>  arch/powerpc/platforms/pseries/lpar.c|  6 ++--
>>  arch/powerpc/platforms/pseries/lparcfg.c | 45 +---
>>  3 files changed, 37 insertions(+), 16 deletions(-)
>> 
>> --
>> 2.39.3
>> 
>
> Ping. 
>
> Michael, Nathan, Naveen
> Any comments on these patches?

Looks fine. I have it in next-test, will probably go into next tomorrow.

cheers


Re: WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c at boot when netconsole is enabled (kernel v6.9-rc5, v6.8.7, sungem, PowerMac G4 DP)

2024-04-29 Thread Jakub Kicinski
On Sun, 28 Apr 2024 12:53:06 +0200 Erhard Furtner wrote:
> With netconsole enabled I get this "WARNING: CPU: 1 PID: 1 at
> net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c" and "WARNING:
> CPU: 1 PID: 1 at kernel/locking/irqflag-debug.c:10
> warn_bogus_irq_restore+0x30/0x44" at boot on my PowerMac G4 DP.
> Happens more often than not (6-7 out of 10 times booting):

Could you try with LOCKDEP enabled?
I wonder if irqs_disabled() behaves differently than we expect.


Re: [PATCH v2] kprobe/ftrace: bail out if ftrace was killed

2024-04-29 Thread Steven Rostedt
On Mon, 29 Apr 2024 10:47:18 -0700
Stephen Brennan  wrote:

> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
> 
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> 
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
> 
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
> 
> Signed-off-by: Stephen Brennan 
> ---
> Difference from v1: removed both existing declarations of ftrace_is_dead()
> from kernel/trace/trace.h.
> 
>  arch/csky/kernel/probes/ftrace.c | 3 +++
>  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>  arch/parisc/kernel/ftrace.c  | 3 +++
>  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>  arch/riscv/kernel/probes/ftrace.c| 3 +++
>  arch/s390/kernel/ftrace.c| 3 +++
>  arch/x86/kernel/kprobes/ftrace.c | 3 +++
>  include/linux/ftrace.h   | 2 ++
>  kernel/trace/trace.h | 2 --
>  9 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/csky/kernel/probes/ftrace.c 
> b/arch/csky/kernel/probes/ftrace.c
> index 834cffcfbce3..3931bf9f707b 100644
> --- a/arch/csky/kernel/probes/ftrace.c
> +++ b/arch/csky/kernel/probes/ftrace.c
> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
>   struct kprobe_ctlblk *kcb;
>   struct pt_regs *regs;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
> b/arch/loongarch/kernel/ftrace_dyn.c
> index 73858c9029cc..82c952cb5be0 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
>   struct kprobe *p;
>   struct kprobe_ctlblk *kcb;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> index 621a4b386ae4..3660834f54c3 100644
> --- a/arch/parisc/kernel/ftrace.c
> +++ b/arch/parisc/kernel/ftrace.c
> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
>   struct kprobe *p;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
> b/arch/powerpc/kernel/kprobes-ftrace.c
> index 072ebe7f290b..85eb55aa1457 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
> parent_nip,
>   struct pt_regs *regs;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(nip, parent_nip);
>   if (bit < 0)
>   return;
> diff --git a/arch/riscv/kernel/probes/ftrace.c 
> b/arch/riscv/kernel/probes/ftrace.c
> index 7142ec42e889..8814fbe4c888 100644
> --- a/arch/riscv/kernel/probes/ftrace.c
> +++ b/arch/riscv/kernel/probes/ftrace.c
> @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
>   struct kprobe_ctlblk *kcb;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
> index c46381ea04ec..ccbe8ccf945b 100644
> --- a/arch/s390/kernel/ftrace.c
> +++ b/arch/s390/kernel/ftrace.c
> @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long 

Re: [PATCH] kprobe/ftrace: bail out if ftrace was killed

2024-04-29 Thread Stephen Brennan
Masami Hiramatsu (Google)  writes:
> Hi Stephen,
>
> On Fri, 26 Apr 2024 15:58:34 -0700
> Stephen Brennan  wrote:
>
>> If an error happens in ftrace, ftrace_kill() will prevent disarming
>> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> freed, yet the kprobes will still be active, and when triggered, they
>> will use the freed memory, likely resulting in a page fault and panic.
>
> Hmm, indeed.
>
>> 
>> This behavior can be reproduced quite easily, by creating a kprobe and
>> then triggering a ftrace_kill(). For simplicity, we can simulate an
>> ftrace error with a kernel module like [1]:
>> 
>> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> 
>>   sudo perf probe --add commit_creds
>>   sudo perf trace -e probe:commit_creds
>>   # In another terminal
>>   make
>>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>>   # Back to perf terminal
>>   # ctrl-c
>>   sudo perf probe --del commit_creds
>> 
>> After a short period, a page fault and panic would occur as the kprobe
>> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> is supposed to be used only in extreme circumstances, it is invoked in
>> FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> could be triggered, yet the system may continue operating, possibly
>> without the administrator noticing. If ftrace_kill() does not panic the
>> system, then we should do everything we can to continue operating,
>> rather than leave a ticking time bomb.
>
> OK, the patch looks good to me.
>
> Acked-by: Masami Hiramatsu (Google) 
>
> Thanks!

Hi Masami,

Thank you! Sadly I took a second look at the patch and noticed I forgot
to remove the existing declarations of ftrace_is_dead() from
kernel/trace/trace.h. I've sent v2 in reply to v1 in order to correct
that. I'm sorry for the churn.

Thanks,
Stephen

>> 
>> Signed-off-by: Stephen Brennan 
>> ---
>> 
>> Apologies for the wide net cast here. I recognize that a change like this
>> may need to be split up and go through arch-specific trees. I hoped to get
>> feedback on the patch itself. If it's satisfactory and the architecture
>> maintainers prefer it split out, I'm glad to do it. Thanks!
>> 
>>  arch/csky/kernel/probes/ftrace.c | 3 +++
>>  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>>  arch/parisc/kernel/ftrace.c  | 3 +++
>>  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>>  arch/riscv/kernel/probes/ftrace.c| 3 +++
>>  arch/s390/kernel/ftrace.c| 3 +++
>>  arch/x86/kernel/kprobes/ftrace.c | 3 +++
>>  include/linux/ftrace.h   | 2 ++
>>  8 files changed, 23 insertions(+)
>> 
>> diff --git a/arch/csky/kernel/probes/ftrace.c 
>> b/arch/csky/kernel/probes/ftrace.c
>> index 834cffcfbce3..3931bf9f707b 100644
>> --- a/arch/csky/kernel/probes/ftrace.c
>> +++ b/arch/csky/kernel/probes/ftrace.c
>> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
>> parent_ip,
>>  struct kprobe_ctlblk *kcb;
>>  struct pt_regs *regs;
>>  
>> +if (unlikely(ftrace_is_dead()))
>> +return;
>> +
>>  bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  if (bit < 0)
>>  return;
>> diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
>> b/arch/loongarch/kernel/ftrace_dyn.c
>> index 73858c9029cc..82c952cb5be0 100644
>> --- a/arch/loongarch/kernel/ftrace_dyn.c
>> +++ b/arch/loongarch/kernel/ftrace_dyn.c
>> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
>> long parent_ip,
>>  struct kprobe *p;
>>  struct kprobe_ctlblk *kcb;
>>  
>> +if (unlikely(ftrace_is_dead()))
>> +return;
>> +
>>  bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  if (bit < 0)
>>  return;
>> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
>> index 621a4b386ae4..3660834f54c3 100644
>> --- a/arch/parisc/kernel/ftrace.c
>> +++ b/arch/parisc/kernel/ftrace.c
>> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
>> long parent_ip,
>>  struct kprobe *p;
>>  int bit;
>>  
>> +if (unlikely(ftrace_is_dead()))
>> +return;
>> +
>>  bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  if (bit < 0)
>>  return;
>> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
>> b/arch/powerpc/kernel/kprobes-ftrace.c
>> index 072ebe7f290b..85eb55aa1457 100644
>> --- a/arch/powerpc/kernel/kprobes-ftrace.c
>> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
>> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned 
>> long parent_nip,
>>  struct pt_regs *regs;
>>  int bit;
>>  
>> +if (unlikely(ftrace_is_dead()))
>> +return;
>> +
>>  bit = ftrace_test_recursion_trylock(nip, parent_nip);
>>  if (bit < 0)
>>  return;
>> diff --git a/arch/riscv/kernel/probes/ftrace.c 
>> b/arch/riscv/kernel/probes/ftrace.c
>> index 7142ec42e889..8814fbe4c888 100644
>> --- 

[PATCH v2] kprobe/ftrace: bail out if ftrace was killed

2024-04-29 Thread Stephen Brennan
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.

This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:

[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer

  sudo perf probe --add commit_creds
  sudo perf trace -e probe:commit_creds
  # In another terminal
  make
  sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
  # Back to perf terminal
  # ctrl-c
  sudo perf probe --del commit_creds

After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.

Signed-off-by: Stephen Brennan 
---
Difference from v1: removed both existing declarations of ftrace_is_dead()
from kernel/trace/trace.h.

 arch/csky/kernel/probes/ftrace.c | 3 +++
 arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
 arch/parisc/kernel/ftrace.c  | 3 +++
 arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
 arch/riscv/kernel/probes/ftrace.c| 3 +++
 arch/s390/kernel/ftrace.c| 3 +++
 arch/x86/kernel/kprobes/ftrace.c | 3 +++
 include/linux/ftrace.h   | 2 ++
 kernel/trace/trace.h | 2 --
 9 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 834cffcfbce3..3931bf9f707b 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
struct pt_regs *regs;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..82c952cb5be0 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
struct kprobe_ctlblk *kcb;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 621a4b386ae4..3660834f54c3 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 072ebe7f290b..85eb55aa1457 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
struct pt_regs *regs;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(nip, parent_nip);
if (bit < 0)
return;
diff --git a/arch/riscv/kernel/probes/ftrace.c 
b/arch/riscv/kernel/probes/ftrace.c
index 7142ec42e889..8814fbe4c888 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..ccbe8ccf945b 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;

Re: [PATCH][next] crypto/nx: Avoid potential -Wflex-array-member-not-at-end warning

2024-04-29 Thread Kees Cook
On Mon, Mar 25, 2024 at 03:01:56PM -0600, Gustavo A. R. Silva wrote:
> -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
> ready to enable it globally.
> 
> Use the `__struct_group()` helper to separate the flexible array
> from the rest of the members in flexible `struct nx842_crypto_header`,
> through tagged `struct nx842_crypto_header_hdr`, and avoid embedding
> the flexible-array member in the middle of `struct nx842_crypto_ctx`.
> 
> Also, use `container_of()` whenever we need to retrieve a pointer to
> the flexible structure.
> 
> This code was detected with the help of Coccinelle, and audited and
> modified manually.
> 
> Link: https://github.com/KSPP/linux/issues/202
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/crypto/nx/nx-842.c |  6 --
>  drivers/crypto/nx/nx-842.h | 11 +++
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
> index 2ab90ec10e61..82214cde2bcd 100644
> --- a/drivers/crypto/nx/nx-842.c
> +++ b/drivers/crypto/nx/nx-842.c
> @@ -251,7 +251,9 @@ int nx842_crypto_compress(struct crypto_tfm *tfm,
> u8 *dst, unsigned int *dlen)
>  {
>   struct nx842_crypto_ctx *ctx = crypto_tfm_ctx(tfm);
> - struct nx842_crypto_header *hdr = >header;
> + struct nx842_crypto_header *hdr =
> + container_of(>header,
> +  struct nx842_crypto_header, hdr);
>   struct nx842_crypto_param p;
>   struct nx842_constraints c = *ctx->driver->constraints;
>   unsigned int groups, hdrsize, h;
> @@ -490,7 +492,7 @@ int nx842_crypto_decompress(struct crypto_tfm *tfm,
>   }
>  
>   memcpy(>header, src, hdr_len);
> - hdr = >header;
> + hdr = container_of(>header, struct nx842_crypto_header, hdr);
>  
>   for (n = 0; n < hdr->groups; n++) {
>   /* ignore applies to last group */
> diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
> index 7590bfb24d79..1f42c83d2683 100644
> --- a/drivers/crypto/nx/nx-842.h
> +++ b/drivers/crypto/nx/nx-842.h
> @@ -157,9 +157,12 @@ struct nx842_crypto_header_group {
>  } __packed;
>  
>  struct nx842_crypto_header {
> - __be16 magic;   /* NX842_CRYPTO_MAGIC */
> - __be16 ignore;  /* decompressed end bytes to ignore */
> - u8 groups;  /* total groups in this header */
> + /* New members must be added within the __struct_group() macro below. */
> + __struct_group(nx842_crypto_header_hdr, hdr, __packed,
> + __be16 magic;   /* NX842_CRYPTO_MAGIC */
> + __be16 ignore;  /* decompressed end bytes to ignore */
> + u8 groups;  /* total groups in this header */
> + );
>   struct nx842_crypto_header_group group[];
>  } __packed;
>  
> @@ -171,7 +174,7 @@ struct nx842_crypto_ctx {
>   u8 *wmem;
>   u8 *sbounce, *dbounce;
>  
> - struct nx842_crypto_header header;
> + struct nx842_crypto_header_hdr header;
>   struct nx842_crypto_header_group group[NX842_CRYPTO_GROUP_MAX];
>  
>   struct nx842_driver *driver;

Hmm. I think commit 03952d980153 ("crypto: nx - make platform drivers
directly register with crypto") incorrectly added "struct nx842_driver
*driver" to the end of struct nx842_crypto_ctx. I think it should be
before "header".

Then I see:

#define NX842_CRYPTO_HEADER_SIZE(g) \
(sizeof(struct nx842_crypto_header) +   \
 sizeof(struct nx842_crypto_header_group) * (g))

This is just struct_size(), really. And nothing uses:

#define NX842_CRYPTO_HEADER_MAX_SIZE\
NX842_CRYPTO_HEADER_SIZE(NX842_CRYPTO_GROUP_MAX)

And then looking for what uses struct nx842_crypto_ctx's "group" member,
I don't see anything except some sizeof()s:

drivers/crypto/nx/nx-common-powernv.c:1044: .cra_ctxsize = sizeof(struct 
nx842_crypto_ctx),
drivers/crypto/nx/nx-common-pseries.c:1021: .cra_ctxsize = sizeof(struct 
nx842_crypto_ctx),

This is just a maximally sized ctx (as if the group count were
NX842_CRYPTO_GROUP_MAX), which we could use struct_size for again:

 .cra_ctxsize = struct_size_t(struct nx842_crypto_ctx, header.group,
  NX842_CRYPTO_GROUP_MAX),

So then "group" can be entirely removed from struct nx842_crypto_ctx.

The result means we can also add __counted_by:


diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 2ab90ec10e61..144972fe2e6f 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -62,10 +62,7 @@
  */
 #define NX842_CRYPTO_MAGIC (0xf842)
 #define NX842_CRYPTO_HEADER_SIZE(g)\
-   (sizeof(struct nx842_crypto_header) +   \
-sizeof(struct nx842_crypto_header_group) * (g))
-#define NX842_CRYPTO_HEADER_MAX_SIZE   \
-   

Re: [PATCH v7 00/16] mm: jit/text allocator

2024-04-29 Thread Luis Chamberlain
On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> 
> v7 changes:
> * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
>   #ifdefs in a function body
> * add Acks, thanks everybody

Thanks, I've pushed this to modules-next for further exposure / testing.
Given the status of testing so far with prior revisions, in that only a
few issues were found and that those were fixed, and the status of
reviews, this just might be ripe for v6.10.

  Luis


Re: [PATCH 2/3] selftest/powerpc: Add flags.mk to support pmu buildable

2024-04-29 Thread Michael Ellerman
Madhavan Srinivasan  writes:
> When running `make -C powerpc/pmu run_tests` from top level selftests
> directory, currently this error is being reported
>
> make: Entering directory 
> '/home/maddy/linux/tools/testing/selftests/powerpc/pmu'
> Makefile:40: warning: overriding recipe for target 'emit_tests'
> ../../lib.mk:111: warning: ignoring old recipe for target 'emit_tests'
> gcc -m64count_instructions.c ../harness.c event.c lib.c ../utils.c loop.S 
>  -o /home/maddy/selftest_output//count_instructions
> In file included from count_instructions.c:13:
> event.h:12:10: fatal error: utils.h: No such file or directory
> 12 | #include "utils.h"
>   |  ^
> compilation terminated.
>
> This is due to missing of include path in CFLAGS. That is, CFLAGS and
> GIT_VERSION macros are defined in the powerpc/ folder Makefile which
> in this case not involved.
>
> To address the failure incase of executing specific sub-folder test directly,
> a new rule file has been addded by the patch called "flags.mk" under
> selftest/powerpc/ folder and is linked to all the Makefile of powerpc/pmu
> sub-folders.

This patch made my selftest build go from ~10s to ~50s !

I tracked it down to "git describe" being run hundreds of times.

> diff --git a/tools/testing/selftests/powerpc/flags.mk 
> b/tools/testing/selftests/powerpc/flags.mk
> new file mode 100644
> index ..28374f470126
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/flags.mk
> @@ -0,0 +1,12 @@
> +#This checks for any ENV variables and add those.
> +
> +#ifeq ($(GIT_VERSION),)
 
This isn't right, # is a comment in make syntax, so this line is just a
comment. It needs to be "ifeq".

> +GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown")
 
Using '=' here means Make re-runs the command every time the variable is
used. Previously that was OK because the variable was set once and then
exported. But now that it's a Make variable in each file it leads to
"git describe" being run a few hundred times.

I've squashed in those fixes, no need to send a v2.

cheers


Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread Peter Xu
On Mon, Apr 29, 2024 at 03:26:22PM +0200, David Hildenbrand wrote:
> > The test patch here doesn't need to rush. David, how about you prepare a
> > better and verified patch and post it separately, making sure to cover all
> > the things we used to cover plus the unshare?  IIUC it used to be not
> > touched because of pte_write() always returns true with a write prefault.
> > 
> > Then we let patch 1 go through first, and drop this one?
> 
> Whatever you prefer!

Thanks!

Andrew, would you consider taking patch 1 but ignore this patch 2? Or do
you prefer me to resend?

-- 
Peter Xu



Re: [PATCH] kprobe/ftrace: bail out if ftrace was killed

2024-04-29 Thread Google
Hi Stephen,

On Fri, 26 Apr 2024 15:58:34 -0700
Stephen Brennan  wrote:

> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.

Hmm, indeed.

> 
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> 
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
> 
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.

OK, the patch looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks!

> 
> Signed-off-by: Stephen Brennan 
> ---
> 
> Apologies for the wide net cast here. I recognize that a change like this
> may need to be split up and go through arch-specific trees. I hoped to get
> feedback on the patch itself. If it's satisfactory and the architecture
> maintainers prefer it split out, I'm glad to do it. Thanks!
> 
>  arch/csky/kernel/probes/ftrace.c | 3 +++
>  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>  arch/parisc/kernel/ftrace.c  | 3 +++
>  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>  arch/riscv/kernel/probes/ftrace.c| 3 +++
>  arch/s390/kernel/ftrace.c| 3 +++
>  arch/x86/kernel/kprobes/ftrace.c | 3 +++
>  include/linux/ftrace.h   | 2 ++
>  8 files changed, 23 insertions(+)
> 
> diff --git a/arch/csky/kernel/probes/ftrace.c 
> b/arch/csky/kernel/probes/ftrace.c
> index 834cffcfbce3..3931bf9f707b 100644
> --- a/arch/csky/kernel/probes/ftrace.c
> +++ b/arch/csky/kernel/probes/ftrace.c
> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
>   struct kprobe_ctlblk *kcb;
>   struct pt_regs *regs;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
> b/arch/loongarch/kernel/ftrace_dyn.c
> index 73858c9029cc..82c952cb5be0 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
>   struct kprobe *p;
>   struct kprobe_ctlblk *kcb;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> index 621a4b386ae4..3660834f54c3 100644
> --- a/arch/parisc/kernel/ftrace.c
> +++ b/arch/parisc/kernel/ftrace.c
> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
>   struct kprobe *p;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
> b/arch/powerpc/kernel/kprobes-ftrace.c
> index 072ebe7f290b..85eb55aa1457 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
> parent_nip,
>   struct pt_regs *regs;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(nip, parent_nip);
>   if (bit < 0)
>   return;
> diff --git a/arch/riscv/kernel/probes/ftrace.c 
> b/arch/riscv/kernel/probes/ftrace.c
> index 7142ec42e889..8814fbe4c888 100644
> --- a/arch/riscv/kernel/probes/ftrace.c
> +++ b/arch/riscv/kernel/probes/ftrace.c
> @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
>   struct kprobe_ctlblk *kcb;
>   int bit;
>  
> + if (unlikely(ftrace_is_dead()))
> + return;
> +
>   bit = ftrace_test_recursion_trylock(ip, parent_ip);
>   if (bit < 0)
>   return;
> diff 

Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread David Hildenbrand

On 29.04.24 15:10, Peter Xu wrote:

On Mon, Apr 29, 2024 at 09:28:15AM +0200, David Hildenbrand wrote:

On 28.04.24 21:01, Peter Xu wrote:

Prefault, especially with RW, makes the GUP test too easy, and may not yet
reach the core of the test.

For example, R/O longterm pins will just hit, pte_write()==true for
whatever cases, the unsharing logic won't be ever tested.

This patch remove the prefault.  This tortures more code paths at least to
cover the unshare care for R/O longterm pins, in which case the first R/O
GUP attempt will fault in the page R/O first, then the 2nd will go through
the unshare path, checking whether an unshare is needed.

Cc: David Hildenbrand 
Signed-off-by: Peter Xu 
---
   tools/testing/selftests/mm/gup_longterm.c | 12 +---
   1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/gup_longterm.c 
b/tools/testing/selftests/mm/gup_longterm.c
index ad168d35b23b..488e32186246 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -119,10 +119,16 @@ static void do_test(int fd, size_t size, enum test_type 
type, bool shared)
}
/*
-* Fault in the page writable such that GUP-fast can eventually pin
-* it immediately.
+* Explicitly avoid pre-faulting in the page, this can help testing
+* more code paths.
+*
+* Take example of an upcoming R/O pin test, if we RW prefault the
+* page, such pin will directly skip R/O unsharing and the longterm
+* pin will success mostly always.  When not prefaulted, R/O
+* longterm pin will first fault in a RO page, then the 2nd round
+* it'll go via the unshare check.  Otherwise those paths aren't
+* covered.
 */

This will mean that GUP-fast never succeeds, which removes quite some testing
coverage for most other tests here.

Note that the main motivation of this test was to test gup_fast_folio_allowed(),
where we had issues with GUP-fast during development.


Ah I didn't notice that, as I thought that whitelists memfd ones.



Would the following also get the job done?

diff --git a/tools/testing/selftests/mm/gup_longterm.c 
b/tools/testing/selftests/mm/gup_longterm.c
index ad168d35b23b7..e917a7c58d571 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -92,7 +92,7 @@ static void do_test(int fd, size_t size, enum test_type type, 
bool shared)
  {
__fsword_t fs_type = get_fs_type(fd);
bool should_work;
-   char *mem;
+   char tmp, *mem;
int ret;
if (ftruncate(fd, size)) {
@@ -119,10 +119,19 @@ static void do_test(int fd, size_t size, enum test_type 
type, bool shared)
}
/*
-* Fault in the page writable such that GUP-fast can eventually pin
-* it immediately.
+* Fault in the page such that GUP-fast might be able to pin it
+* immediately. To cover more cases, don't fault in pages writable when
+* R/O pinning.
 */
-   memset(mem, 0, size);
+   switch (type) {
+   case TEST_TYPE_RO:
+   case TEST_TYPE_RO_FAST:
+   tmp = *mem;
+   asm volatile("" : "+r" (tmp));
+   break;
+   default:
+   memset(mem, 0, size);
+   };
switch (type) {
case TEST_TYPE_RO:


Yes this could work too.

The test patch here doesn't need to rush. David, how about you prepare a
better and verified patch and post it separately, making sure to cover all
the things we used to cover plus the unshare?  IIUC it used to be not
touched because of pte_write() always returns true with a write prefault.

Then we let patch 1 go through first, and drop this one?


Whatever you prefer!

--
Cheers,

David / dhildenb



Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread Peter Xu
On Mon, Apr 29, 2024 at 09:28:15AM +0200, David Hildenbrand wrote:
> On 28.04.24 21:01, Peter Xu wrote:
> > Prefault, especially with RW, makes the GUP test too easy, and may not yet
> > reach the core of the test.
> > 
> > For example, R/O longterm pins will just hit, pte_write()==true for
> > whatever cases, the unsharing logic won't be ever tested.
> > 
> > This patch remove the prefault.  This tortures more code paths at least to
> > cover the unshare care for R/O longterm pins, in which case the first R/O
> > GUP attempt will fault in the page R/O first, then the 2nd will go through
> > the unshare path, checking whether an unshare is needed.
> > 
> > Cc: David Hildenbrand 
> > Signed-off-by: Peter Xu 
> > ---
> >   tools/testing/selftests/mm/gup_longterm.c | 12 +---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/mm/gup_longterm.c 
> > b/tools/testing/selftests/mm/gup_longterm.c
> > index ad168d35b23b..488e32186246 100644
> > --- a/tools/testing/selftests/mm/gup_longterm.c
> > +++ b/tools/testing/selftests/mm/gup_longterm.c
> > @@ -119,10 +119,16 @@ static void do_test(int fd, size_t size, enum 
> > test_type type, bool shared)
> > }
> > /*
> > -* Fault in the page writable such that GUP-fast can eventually pin
> > -* it immediately.
> > +* Explicitly avoid pre-faulting in the page, this can help testing
> > +* more code paths.
> > +*
> > +* Take example of an upcoming R/O pin test, if we RW prefault the
> > +* page, such pin will directly skip R/O unsharing and the longterm
> > +* pin will success mostly always.  When not prefaulted, R/O
> > +* longterm pin will first fault in a RO page, then the 2nd round
> > +* it'll go via the unshare check.  Otherwise those paths aren't
> > +* covered.
> >  */
> This will mean that GUP-fast never succeeds, which removes quite some testing
> coverage for most other tests here.
> 
> Note that the main motivation of this test was to test 
> gup_fast_folio_allowed(),
> where we had issues with GUP-fast during development.

Ah I didn't notice that, as I thought that whitelists memfd ones.

> 
> Would the following also get the job done?
> 
> diff --git a/tools/testing/selftests/mm/gup_longterm.c 
> b/tools/testing/selftests/mm/gup_longterm.c
> index ad168d35b23b7..e917a7c58d571 100644
> --- a/tools/testing/selftests/mm/gup_longterm.c
> +++ b/tools/testing/selftests/mm/gup_longterm.c
> @@ -92,7 +92,7 @@ static void do_test(int fd, size_t size, enum test_type 
> type, bool shared)
>  {
>   __fsword_t fs_type = get_fs_type(fd);
>   bool should_work;
> - char *mem;
> + char tmp, *mem;
>   int ret;
>   if (ftruncate(fd, size)) {
> @@ -119,10 +119,19 @@ static void do_test(int fd, size_t size, enum test_type 
> type, bool shared)
>   }
>   /*
> -  * Fault in the page writable such that GUP-fast can eventually pin
> -  * it immediately.
> +  * Fault in the page such that GUP-fast might be able to pin it
> +  * immediately. To cover more cases, don't fault in pages writable when
> +  * R/O pinning.
>*/
> - memset(mem, 0, size);
> + switch (type) {
> + case TEST_TYPE_RO:
> + case TEST_TYPE_RO_FAST:
> + tmp = *mem;
> + asm volatile("" : "+r" (tmp));
> + break;
> + default:
> + memset(mem, 0, size);
> + };
>   switch (type) {
>   case TEST_TYPE_RO:

Yes this could work too.

The test patch here doesn't need to rush. David, how about you prepare a
better and verified patch and post it separately, making sure to cover all
the things we used to cover plus the unshare?  IIUC it used to be not
touched because of pte_write() always returns true with a write prefault.

Then we let patch 1 go through first, and drop this one?

Thanks,

-- 
Peter Xu



[PATCH v7 16/16] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0



[PATCH v7 15/16] kprobes: remove dependency on CONFIG_MODULES

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 4fd0daa54e6c..caa459964f09 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, _blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(>list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, _blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(>list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(_module_nb);
+}
+#else
+static int 

[PATCH v7 14/16] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0



[PATCH v7 13/16] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..f2917ccf4fb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0



[PATCH v7 12/16] arch: make execmem setup available regardless of CONFIG_MODULES

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Philippe Mathieu-Daudé 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  34 
 arch/riscv/mm/init.c   |  35 +
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 453 insertions(+), 435 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return _info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+  

[PATCH v7 11/16] powerpc: extend execmem_params for kprobes allocations

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0



[PATCH v7 10/16] arm64: extend execmem_info for generated code allocations

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on arm64 can be placed
anywhere in vmalloc address space and currently this is implemented with
overrides of alloc_insn_page() and bpf_jit_alloc_exec() in arm64.

Define EXECMEM_KPROBES and EXECMEM_BPF ranges in arm64::execmem_info and
drop overrides of alloc_insn_page() and bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/arm64/kernel/module.c | 12 
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 ---
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index b7a7a23f9f8f..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -146,6 +146,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_ROX,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
},
};
 
diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index 327855a11df2..4268678d0e86 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -129,13 +129,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-void *alloc_insn_page(void)
-{
-   return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
-   NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 /* arm kprobe: install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 122021f9bdfc..456f5af239fc 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1793,17 +1793,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return VMALLOC_END - VMALLOC_START;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   /* Memory is intended to be executable, reset the pointer tag. */
-   return kasan_reset_tag(vmalloc(size));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
-- 
2.43.0



[PATCH v7 09/16] riscv: extend execmem_params for generated code allocations

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for
32 bit and slightly reorder execmem_params initialization to support both
32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/include/asm/pgtable.h   |  3 +++
 arch/riscv/kernel/module.c | 14 +-
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9f8ea0e33eb1..5f21814e438e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -55,6 +55,9 @@
 #define MODULES_LOWEST_VADDR   (KERNEL_LINK_ADDR - SZ_2G)
 #define MODULES_VADDR  (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
 #define MODULES_END(PFN_ALIGN((unsigned long)&_start))
+#else
+#define MODULES_VADDR  VMALLOC_START
+#define MODULES_ENDVMALLOC_END
 #endif
 
 /*
diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..0e6415f00fca 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,7 +906,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
@@ -919,6 +919,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0



[PATCH v7 08/16] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
Acked-by: Song Liu 
---
 arch/Kconfig |  8 
 arch/arm/kernel/module.c | 41 
 arch/arm64/Kconfig   |  1 +
 arch/arm64/kernel/module.c   | 55 ++
 arch/powerpc/kernel/module.c | 60 +++--
 arch/s390/kernel/module.c| 54 +++---
 arch/x86/kernel/module.c | 70 +++--
 include/linux/execmem.h  | 30 ++-
 include/linux/moduleloader.h | 12 --
 kernel/module/main.c | 26 +++--
 mm/execmem.c | 75 ++--
 11 files changed, 247 insertions(+), 185 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..4fd0daa54e6c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,14 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_LATE
+   bool
+   help
+ For architectures that do not allocate executable memory early on
+ boot, but rather require its initialization late when there is
+ enough entropy for module space randomization, for instance
+ arm64.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..74b34a78b7ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -105,6 +105,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
&& !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANTS_EXECMEM_LATE if EXECMEM
select ARCH_WANTS_NO_INSTR
select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
select ARCH_HAS_UBSAN
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..b7a7a23f9f8f 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,47 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   

[PATCH v7 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Song Liu 
---
 arch/loongarch/kernel/module.c | 19 --
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 ---
 arch/parisc/kernel/module.c| 24 
 arch/riscv/kernel/module.c | 24 
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 47 
 mm/execmem.c   | 67 --
 mm/mm_init.c   |  2 +
 9 files changed, 210 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --git 

[PATCH v7 06/16] mm: introduce execmem_alloc() and execmem_free()

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
Acked-by: Song Liu 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)

[PATCH v7 05/16] module: make module_memory_{alloc,free} more self-contained

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0



[PATCH v7 04/16] sparc: simplify module_alloc()

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becomes identical for both 32
and 64 bits.

While on it, drop unused include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Sam Ravnborg 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0



[PATCH v7 03/16] nios2: define virtual address space for modules

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0



[PATCH v7 02/16] mips: module: rename MODULE_START to MODULES_VADDR

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0



[PATCH v7 01/16] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0



[PATCH v7 00/16] mm: jit/text allocator

2024-04-29 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

The patches are also available in git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7

v7 changes:
* define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
  #ifdefs in a function body
* add Acks, thanks everybody

v6: https://lore.kernel.org/all/20240426082854.7355-1-r...@kernel.org
* restore patch "arm64: extend execmem_info for generated code
  allocations" that disappeared in v5 rebase
* update execmem initialization so that by default it will be
  initialized early while late initialization will be an opt-in

v5: https://lore.kernel.org/all/20240422094436.3625171-1-r...@kernel.org
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define parameters for ranges suitable for that
subsystem.

The new infrastructure allows decoupling of BPF, kprobes and 

[PATCH 4/4] ASoC: uniphier: Constify static snd_pcm_hardware

2024-04-29 Thread Krzysztof Kozlowski
Static 'struct snd_pcm_hardware' is not modified by the driver and its
copy is passed to the core, so it can be made const for increased code
safety.

Signed-off-by: Krzysztof Kozlowski 
---
 sound/soc/uniphier/aio-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/uniphier/aio-dma.c b/sound/soc/uniphier/aio-dma.c
index fe272befd967..265d61723e99 100644
--- a/sound/soc/uniphier/aio-dma.c
+++ b/sound/soc/uniphier/aio-dma.c
@@ -14,7 +14,7 @@
 
 #include "aio.h"
 
-static struct snd_pcm_hardware uniphier_aiodma_hw = {
+static const struct snd_pcm_hardware uniphier_aiodma_hw = {
.info = SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_INTERLEAVED,

-- 
2.43.0



[PATCH 3/4] ASoC: meson: Constify static snd_pcm_hardware

2024-04-29 Thread Krzysztof Kozlowski
Static 'struct snd_pcm_hardware' is not modified by the driver and its
copy is passed to the core, so it can be made const for increased code
safety.

Signed-off-by: Krzysztof Kozlowski 
---
 sound/soc/meson/aiu-fifo-i2s.c   | 2 +-
 sound/soc/meson/aiu-fifo-spdif.c | 2 +-
 sound/soc/meson/aiu-fifo.h   | 2 +-
 sound/soc/meson/axg-fifo.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/meson/aiu-fifo-i2s.c b/sound/soc/meson/aiu-fifo-i2s.c
index 7d833500c799..eccbc16b293a 100644
--- a/sound/soc/meson/aiu-fifo-i2s.c
+++ b/sound/soc/meson/aiu-fifo-i2s.c
@@ -25,7 +25,7 @@
 
 #define AIU_FIFO_I2S_BLOCK 256
 
-static struct snd_pcm_hardware fifo_i2s_pcm = {
+static const struct snd_pcm_hardware fifo_i2s_pcm = {
.info = (SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_MMAP |
 SNDRV_PCM_INFO_MMAP_VALID |
diff --git a/sound/soc/meson/aiu-fifo-spdif.c b/sound/soc/meson/aiu-fifo-spdif.c
index fa91f3c53fa4..e0e00ec026dc 100644
--- a/sound/soc/meson/aiu-fifo-spdif.c
+++ b/sound/soc/meson/aiu-fifo-spdif.c
@@ -27,7 +27,7 @@
 
 #define AIU_FIFO_SPDIF_BLOCK   8
 
-static struct snd_pcm_hardware fifo_spdif_pcm = {
+static const struct snd_pcm_hardware fifo_spdif_pcm = {
.info = (SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_MMAP |
 SNDRV_PCM_INFO_MMAP_VALID |
diff --git a/sound/soc/meson/aiu-fifo.h b/sound/soc/meson/aiu-fifo.h
index 42ce266677cc..84ab4577815a 100644
--- a/sound/soc/meson/aiu-fifo.h
+++ b/sound/soc/meson/aiu-fifo.h
@@ -18,7 +18,7 @@ struct snd_pcm_hw_params;
 struct platform_device;
 
 struct aiu_fifo {
-   struct snd_pcm_hardware *pcm;
+   const struct snd_pcm_hardware *pcm;
unsigned int mem_offset;
unsigned int fifo_block;
struct clk *pclk;
diff --git a/sound/soc/meson/axg-fifo.c b/sound/soc/meson/axg-fifo.c
index bebee0ca8e38..633fd971b444 100644
--- a/sound/soc/meson/axg-fifo.c
+++ b/sound/soc/meson/axg-fifo.c
@@ -23,7 +23,7 @@
  * These differences are handled in the respective DAI drivers
  */
 
-static struct snd_pcm_hardware axg_fifo_hw = {
+static const struct snd_pcm_hardware axg_fifo_hw = {
.info = (SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_MMAP |
 SNDRV_PCM_INFO_MMAP_VALID |

-- 
2.43.0



[PATCH 2/4] ASoC: fsl: Constify static snd_pcm_hardware

2024-04-29 Thread Krzysztof Kozlowski
Static 'struct snd_pcm_hardware' is not modified by the driver and its
copy is passed to the core, so it can be made const for increased code
safety.

Signed-off-by: Krzysztof Kozlowski 
---
 sound/soc/fsl/imx-pcm-rpmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/imx-pcm-rpmsg.c b/sound/soc/fsl/imx-pcm-rpmsg.c
index b84d1dfddba2..128728b65a3a 100644
--- a/sound/soc/fsl/imx-pcm-rpmsg.c
+++ b/sound/soc/fsl/imx-pcm-rpmsg.c
@@ -16,7 +16,7 @@
 #include "fsl_rpmsg.h"
 #include "imx-pcm-rpmsg.h"
 
-static struct snd_pcm_hardware imx_rpmsg_pcm_hardware = {
+static const struct snd_pcm_hardware imx_rpmsg_pcm_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_BATCH |

-- 
2.43.0



[PATCH 1/4] ASoC: qcom: Constify static snd_pcm_hardware

2024-04-29 Thread Krzysztof Kozlowski
Static 'struct snd_pcm_hardware' is not modified by the driver and its
copy is passed to the core, so it can be made const for increased code
safety.

Signed-off-by: Krzysztof Kozlowski 
---
 sound/soc/qcom/qdsp6/q6apm-dai.c | 4 ++--
 sound/soc/qcom/qdsp6/q6asm-dai.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sound/soc/qcom/qdsp6/q6apm-dai.c b/sound/soc/qcom/qdsp6/q6apm-dai.c
index 00bbd291be5c..3c78440a2f0f 100644
--- a/sound/soc/qcom/qdsp6/q6apm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6apm-dai.c
@@ -85,7 +85,7 @@ struct q6apm_dai_data {
long long sid;
 };
 
-static struct snd_pcm_hardware q6apm_dai_hardware_capture = {
+static const struct snd_pcm_hardware q6apm_dai_hardware_capture = {
.info = (SNDRV_PCM_INFO_MMAP | 
SNDRV_PCM_INFO_BLOCK_TRANSFER |
 SNDRV_PCM_INFO_MMAP_VALID | 
SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_PAUSE | SNDRV_PCM_INFO_RESUME |
@@ -104,7 +104,7 @@ static struct snd_pcm_hardware q6apm_dai_hardware_capture = 
{
.fifo_size =0,
 };
 
-static struct snd_pcm_hardware q6apm_dai_hardware_playback = {
+static const struct snd_pcm_hardware q6apm_dai_hardware_playback = {
.info = (SNDRV_PCM_INFO_MMAP | 
SNDRV_PCM_INFO_BLOCK_TRANSFER |
 SNDRV_PCM_INFO_MMAP_VALID | 
SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_PAUSE | SNDRV_PCM_INFO_RESUME |
diff --git a/sound/soc/qcom/qdsp6/q6asm-dai.c b/sound/soc/qcom/qdsp6/q6asm-dai.c
index aeb6a9d479ab..3913706ccdc5 100644
--- a/sound/soc/qcom/qdsp6/q6asm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6asm-dai.c
@@ -103,7 +103,7 @@ static const struct snd_pcm_hardware 
q6asm_dai_hardware_capture = {
.fifo_size =0,
 };
 
-static struct snd_pcm_hardware q6asm_dai_hardware_playback = {
+static const struct snd_pcm_hardware q6asm_dai_hardware_playback = {
.info = (SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_BATCH |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_MMAP_VALID |

-- 
2.43.0



[PATCH 0/4] ASoC: Constify static snd_pcm_hardware

2024-04-29 Thread Krzysztof Kozlowski
No dependencies.

Static 'struct snd_pcm_hardware' is not modified by few drivers and its
copy is passed to the core, so it can be made const for increased code
safety.

Best regards,
Krzysztof

---
Krzysztof Kozlowski (4):
  ASoC: qcom: Constify static snd_pcm_hardware
  ASoC: fsl: Constify static snd_pcm_hardware
  ASoC: meson: Constify static snd_pcm_hardware
  ASoC: uniphier: Constify static snd_pcm_hardware

 sound/soc/fsl/imx-pcm-rpmsg.c| 2 +-
 sound/soc/meson/aiu-fifo-i2s.c   | 2 +-
 sound/soc/meson/aiu-fifo-spdif.c | 2 +-
 sound/soc/meson/aiu-fifo.h   | 2 +-
 sound/soc/meson/axg-fifo.c   | 2 +-
 sound/soc/qcom/qdsp6/q6apm-dai.c | 4 ++--
 sound/soc/qcom/qdsp6/q6asm-dai.c | 2 +-
 sound/soc/uniphier/aio-dma.c | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)
---
base-commit: ebbaba914053d8bcadd7d64eb8d2555299509a27
change-id: 20240429-n-asoc-const-snd-pcm-hardware-f0a5ade885ca

Best regards,
-- 
Krzysztof Kozlowski 



Re: [EXT] [PATCH v8 6/6] docs: trusted-encrypted: add DCP as new trust source

2024-04-29 Thread David Gstir
Hi Kshitiz,

> On 09.04.2024, at 11:48, Kshitiz Varshney  wrote:
> 
> Hi Jarkko,
> 
> 
>> -Original Message-
>> From: Jarkko Sakkinen 
>> Sent: Wednesday, April 3, 2024 9:18 PM
>> To: David Gstir ; Mimi Zohar ;
>> James Bottomley ; Herbert Xu
>> ; David S. Miller 
>> Cc: Shawn Guo ; Jonathan Corbet
>> ; Sascha Hauer ; Pengutronix
>> Kernel Team ; Fabio Estevam
>> ; dl-linux-imx ; Ahmad Fatoum
>> ; sigma star Kernel Team
>> ; David Howells ; Li
>> Yang ; Paul Moore ; James
>> Morris ; Serge E. Hallyn ; Paul E.
>> McKenney ; Randy Dunlap ;
>> Catalin Marinas ; Rafael J. Wysocki
>> ; Tejun Heo ; Steven Rostedt
>> (Google) ; linux-...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; linux-integr...@vger.kernel.org;
>> keyri...@vger.kernel.org; linux-cry...@vger.kernel.org; linux-arm-
>> ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org; linux-security-
>> mod...@vger.kernel.org; Richard Weinberger ; David
>> Oberhollenzer 
>> Subject: [EXT] Re: [PATCH v8 6/6] docs: trusted-encrypted: add DCP as new
>> trust source
>> 
>> Caution: This is an external email. Please take care when clicking links or
>> opening attachments. When in doubt, report the message using the 'Report
>> this email' button
>> 
>> 
>> On Wed Apr 3, 2024 at 10:21 AM EEST, David Gstir wrote:
>>> Update the documentation for trusted and encrypted KEYS with DCP as
>>> new trust source:
>>> 
>>> - Describe security properties of DCP trust source
>>> - Describe key usage
>>> - Document blob format
>>> 
>>> Co-developed-by: Richard Weinberger 
>>> Signed-off-by: Richard Weinberger 
>>> Co-developed-by: David Oberhollenzer
>>> 
>>> Signed-off-by: David Oberhollenzer 
>>> Signed-off-by: David Gstir 
>>> ---
>>> .../security/keys/trusted-encrypted.rst   | 53 +++
>>> security/keys/trusted-keys/trusted_dcp.c  | 19 +++
>>> 2 files changed, 72 insertions(+)
>>> 
>>> diff --git a/Documentation/security/keys/trusted-encrypted.rst
>>> b/Documentation/security/keys/trusted-encrypted.rst
>>> index e989b9802f92..f4d7e162d5e4 100644
>>> --- a/Documentation/security/keys/trusted-encrypted.rst
>>> +++ b/Documentation/security/keys/trusted-encrypted.rst
>>> @@ -42,6 +42,14 @@ safe.
>>>  randomly generated and fused into each SoC at manufacturing time.
>>>  Otherwise, a common fixed test key is used instead.
>>> 
>>> + (4) DCP (Data Co-Processor: crypto accelerator of various i.MX
>>> + SoCs)
>>> +
>>> + Rooted to a one-time programmable key (OTP) that is generally
>> burnt
>>> + in the on-chip fuses and is accessible to the DCP encryption 
>>> engine
>> only.
>>> + DCP provides two keys that can be used as root of trust: the OTP
>> key
>>> + and the UNIQUE key. Default is to use the UNIQUE key, but 
>>> selecting
>>> + the OTP key can be done via a module parameter
>> (dcp_use_otp_key).
>>> +
>>>   *  Execution isolation
>>> 
>>>  (1) TPM
>>> @@ -57,6 +65,12 @@ safe.
>>> 
>>>  Fixed set of operations running in isolated execution environment.
>>> 
>>> + (4) DCP
>>> +
>>> + Fixed set of cryptographic operations running in isolated 
>>> execution
>>> + environment. Only basic blob key encryption is executed there.
>>> + The actual key sealing/unsealing is done on main processor/kernel
>> space.
>>> +
>>>   * Optional binding to platform integrity state
>>> 
>>>  (1) TPM
>>> @@ -79,6 +93,11 @@ safe.
>>>  Relies on the High Assurance Boot (HAB) mechanism of NXP SoCs
>>>  for platform integrity.
>>> 
>>> + (4) DCP
>>> +
>>> + Relies on Secure/Trusted boot process (called HAB by vendor) for
>>> + platform integrity.
>>> +
>>>   *  Interfaces and APIs
>>> 
>>>  (1) TPM
>>> @@ -94,6 +113,11 @@ safe.
>>> 
>>>  Interface is specific to silicon vendor.
>>> 
>>> + (4) DCP
>>> +
>>> + Vendor-specific API that is implemented as part of the DCP crypto
>> driver in
>>> + ``drivers/crypto/mxs-dcp.c``.
>>> +
>>>   *  Threat model
>>> 
>>>  The strength and appropriateness of a particular trust source
>>> for a given @@ -129,6 +153,13 @@ selected trust source:
>>>  CAAM HWRNG, enable CRYPTO_DEV_FSL_CAAM_RNG_API and ensure
>> the device
>>>  is probed.
>>> 
>>> +  *  DCP (Data Co-Processor: crypto accelerator of various i.MX SoCs)
>>> +
>>> + The DCP hardware device itself does not provide a dedicated RNG
>> interface,
>>> + so the kernel default RNG is used. SoCs with DCP like the i.MX6ULL do
>> have
>>> + a dedicated hardware RNG that is independent from DCP which can be
>> enabled
>>> + to back the kernel RNG.
>>> +
>>> Users may override this by specifying ``trusted.rng=kernel`` on the
>>> kernel  command-line to override the used RNG with the kernel's random
>> number pool.
>>> 
>>> @@ -231,6 +262,19 @@ Usage::
>>> CAAM-specific format.  The key length for new keys is always in bytes.
>>> Trusted Keys can be 32 - 128 bytes (256 - 1024 

Re: [PATCH v3 11/11] sysctl: treewide: constify the ctl_table argument of handlers

2024-04-29 Thread Heiko Carstens
On Tue, Apr 23, 2024 at 09:54:46AM +0200, Thomas Weißschuh wrote:
> Adapt the proc_hander function signature to make it clear that handlers
> are not supposed to modify their ctl_table argument.
> 
> This is a prerequisite to moving the static ctl_table structs into
> rodata.
> By migrating all handlers at once a lengthy transition can be avoided.
> 
> The patch was mostly generated by coccinelle with the following script:
> 
> @@
> identifier func, ctl, write, buffer, lenp, ppos;
> @@
> 
> int func(
> - struct ctl_table *ctl,
> + const struct ctl_table *ctl,
>   int write, void *buffer, size_t *lenp, loff_t *ppos)
> { ... }
> 
> In addition to the scripted changes some other changes are done:
> 
> * the typedef proc_handler is adapted
> 
> * the prototypes of non-static handler are adapted
> 
> * kernel/seccomp.c:{read,write}_actions_logged() and
>   kernel/watchdog.c:proc_watchdog_common() are adapted as they need to
>   adapted together with the handlers for type-consistency reasons
> 
> Signed-off-by: Thomas Weißschuh 

...

>  arch/s390/appldata/appldata_base.c| 10 ++---
>  arch/s390/kernel/debug.c  |  2 +-
>  arch/s390/kernel/topology.c   |  2 +-
>  arch/s390/mm/cmm.c|  6 +--

Acked-by: Heiko Carstens  # s390


[PATCH 1/2] PCI/ERR: Cleanup misleading indentation inside if conditions

2024-04-29 Thread Ilpo Järvinen
A few if conditions align misleadingly with the following code block.
The checks are really cascading NULL checks that fit into 80 chars so
remove newlines in between and realign to the if condition indent.

Signed-off-by: Ilpo Järvinen 
---
 drivers/pci/pcie/err.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 705893b5f7b0..31090770fffc 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -116,9 +116,7 @@ static int report_mmio_enabled(struct pci_dev *dev, void 
*data)
 
device_lock(>dev);
pdrv = dev->driver;
-   if (!pdrv ||
-   !pdrv->err_handler ||
-   !pdrv->err_handler->mmio_enabled)
+   if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->mmio_enabled)
goto out;
 
err_handler = pdrv->err_handler;
@@ -137,9 +135,7 @@ static int report_slot_reset(struct pci_dev *dev, void 
*data)
 
device_lock(>dev);
pdrv = dev->driver;
-   if (!pdrv ||
-   !pdrv->err_handler ||
-   !pdrv->err_handler->slot_reset)
+   if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
goto out;
 
err_handler = pdrv->err_handler;
@@ -158,9 +154,7 @@ static int report_resume(struct pci_dev *dev, void *data)
device_lock(>dev);
pdrv = dev->driver;
if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
-   !pdrv ||
-   !pdrv->err_handler ||
-   !pdrv->err_handler->resume)
+   !pdrv || !pdrv->err_handler || !pdrv->err_handler->resume)
goto out;
 
err_handler = pdrv->err_handler;
-- 
2.39.2



Re: [PATCH 2/2] mm/selftests: Don't prefault in gup_longterm tests

2024-04-29 Thread David Hildenbrand

On 28.04.24 21:01, Peter Xu wrote:

Prefault, especially with RW, makes the GUP test too easy, and may not yet
reach the core of the test.

For example, R/O longterm pins will just hit, pte_write()==true for
whatever cases, the unsharing logic won't be ever tested.

This patch remove the prefault.  This tortures more code paths at least to
cover the unshare care for R/O longterm pins, in which case the first R/O
GUP attempt will fault in the page R/O first, then the 2nd will go through
the unshare path, checking whether an unshare is needed.

Cc: David Hildenbrand 
Signed-off-by: Peter Xu 
---
  tools/testing/selftests/mm/gup_longterm.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/gup_longterm.c 
b/tools/testing/selftests/mm/gup_longterm.c
index ad168d35b23b..488e32186246 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -119,10 +119,16 @@ static void do_test(int fd, size_t size, enum test_type 
type, bool shared)
}
  
  	/*

-* Fault in the page writable such that GUP-fast can eventually pin
-* it immediately.
+* Explicitly avoid pre-faulting in the page, this can help testing
+* more code paths.
+*
+* Take example of an upcoming R/O pin test, if we RW prefault the
+* page, such pin will directly skip R/O unsharing and the longterm
+* pin will success mostly always.  When not prefaulted, R/O
+* longterm pin will first fault in a RO page, then the 2nd round
+* it'll go via the unshare check.  Otherwise those paths aren't
+* covered.
 */

This will mean that GUP-fast never succeeds, which removes quite some testing
coverage for most other tests here.

Note that the main motivation of this test was to test gup_fast_folio_allowed(),
where we had issues with GUP-fast during development.

Would the following also get the job done?

diff --git a/tools/testing/selftests/mm/gup_longterm.c 
b/tools/testing/selftests/mm/gup_longterm.c
index ad168d35b23b7..e917a7c58d571 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -92,7 +92,7 @@ static void do_test(int fd, size_t size, enum test_type type, 
bool shared)
 {
__fsword_t fs_type = get_fs_type(fd);
bool should_work;
-   char *mem;
+   char tmp, *mem;
int ret;
 
 	if (ftruncate(fd, size)) {

@@ -119,10 +119,19 @@ static void do_test(int fd, size_t size, enum test_type 
type, bool shared)
}
 
 	/*

-* Fault in the page writable such that GUP-fast can eventually pin
-* it immediately.
+* Fault in the page such that GUP-fast might be able to pin it
+* immediately. To cover more cases, don't fault in pages writable when
+* R/O pinning.
 */
-   memset(mem, 0, size);
+   switch (type) {
+   case TEST_TYPE_RO:
+   case TEST_TYPE_RO_FAST:
+   tmp = *mem;
+   asm volatile("" : "+r" (tmp));
+   break;
+   default:
+   memset(mem, 0, size);
+   };
 
 	switch (type) {

case TEST_TYPE_RO:
--
2.44.0


--
Cheers,

David / dhildenb



Re: [PATCH 1/2] mm/gup: Fix hugepd handling in hugetlb rework

2024-04-29 Thread David Hildenbrand

On 28.04.24 21:01, Peter Xu wrote:

Commit a12083d721d7 added hugepd handling for gup-slow, reusing gup-fast
functions.  follow_hugepd() correctly took the vma pointer in, however
didn't pass it over into the lower functions, which was overlooked.

The issue is gup_fast_hugepte() uses the vma pointer to make the correct
decision on whether an unshare is needed for a FOLL_PIN|FOLL_LONGTERM.  Now
without vma ponter it will constantly return "true" (needs an unshare) for
a page cache, even though in the SHARED case it will be wrong to unshare.

The other problem is, even if an unshare is needed, it now returns 0 rather
than -EMLINK, which will not trigger a follow up FAULT_FLAG_UNSHARE fault.
That will need to be fixed too when the unshare is wanted.

gup_longterm test didn't expose this issue in the past because it didn't
yet test R/O unshare in this case, another separate patch will enable that
in future tests.

Fix it by passing vma correctly to the bottom, rename gup_fast_hugepte()
back to gup_hugepte() as it is shared between the fast/slow paths, and also
allow -EMLINK to be returned properly by gup_hugepte() even though gup-fast
will take it the same as zero.

Reported-by: David Hildenbrand 
Fixes: a12083d721d7 ("mm/gup: handle hugepd for follow_page()")
Signed-off-by: Peter Xu 
---


LGTM

Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb