Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-05-01 Thread Alexey Kardashevskiy




On 2/5/24 00:09, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:

RFC v1 was posted here [1]. As I was testing more and fixing the
issues, I realized its clean to have the table_group_ops implemented
the way it is done on PowerNV and stop 'borrowing' the DMA windows
for pSeries.

This patch-set implements the iommu table_group_ops for pSeries for
VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
pSeries machines.


Wait, did they previously not have any support?

>

Again, this TCE stuff needs to go away, not grow. I can grudgingly
accept fixing it where it used to work, but not enabling more HW that
never worked before! :(



This used to work when I tried last time 2+ years ago, not a new stuff. 
Thanks,



--
Alexey



Re: [PATCH] powerpc: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n

2024-05-01 Thread Michael Ellerman
Nathan Chancellor  writes:
> Hi Michael,
>
> On Wed, May 01, 2024 at 12:04:40AM +1000, Michael Ellerman wrote:
...
>> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
>> index 08c550ed49be..1cd6eb6c8101 100644
>> --- a/arch/powerpc/include/asm/io.h
>> +++ b/arch/powerpc/include/asm/io.h
>> @@ -37,7 +37,7 @@ extern struct pci_dev *isa_bridge_pcidev;
>>   * define properly based on the platform
>>   */
>>  #ifndef CONFIG_PCI
>> -#define _IO_BASE0
>> +#define _IO_BASEPOISON_POINTER_DELTA
>
> This works for CONFIG_PPC64 but not CONFIG_PPC32 (so tinyconfig and
> allnoconfig like Naresh reported) because CONFIG_ILLEGAL_POINTER_VALUE
> is defined as 0 in that case.
>
>   $ grep -P 'CONFIG_(ILLEGAL_POINTER_VALUE|PCI|PPC32)' .config
>   CONFIG_PPC32=y
>   CONFIG_ILLEGAL_POINTER_VALUE=0

:facepalm:

Looks like we can fix the compiler warnings just by doing the arithmetic
before casting to void *. I'll send a v2.

cheers


Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-01 Thread Google
On Thu, 2 May 2024 01:35:16 +0800
Guo Ren  wrote:

> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
>  wrote:
> >
> > If an error happens in ftrace, ftrace_kill() will prevent disarming
> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> > freed, yet the kprobes will still be active, and when triggered, they
> > will use the freed memory, likely resulting in a page fault and panic.
> >
> > This behavior can be reproduced quite easily, by creating a kprobe and
> > then triggering a ftrace_kill(). For simplicity, we can simulate an
> > ftrace error with a kernel module like [1]:
> >
> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> >
> >   sudo perf probe --add commit_creds
> >   sudo perf trace -e probe:commit_creds
> >   # In another terminal
> >   make
> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
> >   # Back to perf terminal
> >   # ctrl-c
> >   sudo perf probe --del commit_creds
> >
> > After a short period, a page fault and panic would occur as the kprobe
> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> > is supposed to be used only in extreme circumstances, it is invoked in
> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
> > could be triggered, yet the system may continue operating, possibly
> > without the administrator noticing. If ftrace_kill() does not panic the
> > system, then we should do everything we can to continue operating,
> > rather than leave a ticking time bomb.
> >
> > Signed-off-by: Stephen Brennan 
> > ---
> > Changes in v3:
> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
> >   variable and check it directly in the kprobe handlers.
> > Link to v1/v2 discussion:
> >   
> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/
> >
> >  arch/csky/kernel/probes/ftrace.c | 3 +++
> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
> >  arch/parisc/kernel/ftrace.c  | 3 +++
> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
> >  arch/riscv/kernel/probes/ftrace.c| 3 +++
> >  arch/s390/kernel/ftrace.c| 3 +++
> >  arch/x86/kernel/kprobes/ftrace.c | 3 +++
> >  include/linux/kprobes.h  | 7 +++
> >  kernel/kprobes.c | 6 ++
> >  kernel/trace/ftrace.c| 1 +
> >  10 files changed, 35 insertions(+)
> >
> > diff --git a/arch/csky/kernel/probes/ftrace.c 
> > b/arch/csky/kernel/probes/ftrace.c
> > index 834cffcfbce3..7ba4b98076de 100644
> > --- a/arch/csky/kernel/probes/ftrace.c
> > +++ b/arch/csky/kernel/probes/ftrace.c
> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> > long parent_ip,
> > struct kprobe_ctlblk *kcb;
> > struct pt_regs *regs;
> >
> > +   if (unlikely(kprobe_ftrace_disabled))
> > +   return;
> > +
> For csky part.
> Acked-by: Guo Ren 

Thanks Stephen, Guo and Steve!

Let me pick this to probes/for-next!

Thank you,

> 
> > bit = ftrace_test_recursion_trylock(ip, parent_ip);
> > if (bit < 0)
> > return;
> > diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
> > b/arch/loongarch/kernel/ftrace_dyn.c
> > index 73858c9029cc..bff058317062 100644
> > --- a/arch/loongarch/kernel/ftrace_dyn.c
> > +++ b/arch/loongarch/kernel/ftrace_dyn.c
> > @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> > long parent_ip,
> > struct kprobe *p;
> > struct kprobe_ctlblk *kcb;
> >
> > +   if (unlikely(kprobe_ftrace_disabled))
> > +   return;
> > +
> > bit = ftrace_test_recursion_trylock(ip, parent_ip);
> > if (bit < 0)
> > return;
> > diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> > index 621a4b386ae4..c91f9c2e61ed 100644
> > --- a/arch/parisc/kernel/ftrace.c
> > +++ b/arch/parisc/kernel/ftrace.c
> > @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> > long parent_ip,
> > struct kprobe *p;
> > int bit;
> >
> > +   if (unlikely(kprobe_ftrace_disabled))
> > +   return;
> > +
> > bit = ftrace_test_recursion_trylock(ip, parent_ip);
> > if (bit < 0)
> > return;
> > diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
> > b/arch/powerpc/kernel/kprobes-ftrace.c
> > index 072ebe7f290b..f8208c027148 100644
> > --- a/arch/powerpc/kernel/kprobes-ftrace.c
> > +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> > @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned 
> > long parent_nip,
> > struct pt_regs *regs;
> > int bit;
> >
> > +   if (unlikely(kprobe_ftrace_disabled))
> > +   return;
> > +
> > bit = ftrace_test_recursion_trylock(nip, parent_nip);
> > if (bit < 0)
> > return;
> > diff --git a/arch/riscv/kernel/probes/ftrace.c 
> > b/arch/riscv/kernel/probes/ftrace.c
> > index 

Re: [PATCH 0/4] KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()

2024-05-01 Thread Oliver Upton
On Wed, May 01, 2024 at 07:28:21AM -0700, Sean Christopherson wrote:
> On Wed, May 01, 2024, Oliver Upton wrote:
> > On Tue, Apr 30, 2024 at 12:31:53PM -0700, Sean Christopherson wrote:
> > > Drop kvm_arch_sched_in() and instead pass a @sched_in boolean to
> > > kvm_arch_vcpu_load().
> > > 
> > > While fiddling with an idea for optimizing state management on AMD CPUs,
> > > I wanted to skip re-saving certain host state when a vCPU is scheduled 
> > > back
> > > in, as the state (theoretically) shouldn't change for the task while it's
> > > scheduled out.  Actually doing that was annoying and unnecessarily brittle
> > > due to having a separate API for the kvm_sched_in() case (the state save
> > > needed to be in kvm_arch_vcpu_load() for the common path).
> > > 
> > > E.g. I could have set a "temporary"-ish flag somewhere in kvm_vcpu, but 
> > > (a)
> > > that's gross and (b) it would rely on the arbitrary ordering between
> > > sched_in() and vcpu_load() staying the same.
> > 
> > Another option would be to change the rules around kvm_arch_sched_in()
> > where the callee is expected to load the vCPU context.
> > 
> > The default implementation could just call kvm_arch_vcpu_load() directly
> > and the x86 implementation can order things the way it wants before
> > kvm_arch_vcpu_load().
> > 
> > I say this because ...
> > 
> > > The only real downside I see is that arm64 and riscv end up having to pass
> > > "false" for their direct usage of kvm_arch_vcpu_load(), and passing 
> > > boolean
> > > literals isn't ideal.  But that can be solved by adding an inner helper 
> > > that
> > > omits the @sched_in param (I almost added a patch to do that, but I 
> > > couldn't
> > > convince myself it was necessary).
> > 
> > Needing to pass @sched_in for other usage of kvm_arch_vcpu_load() hurts
> > readability, especially when no other architecture besides x86 cares
> > about it.
> 
> Yeah, that bothers me too.
> 
> I tried your suggestion of having x86's kvm_arch_sched_in() do 
> kvm_arch_vcpu_load(),
> and even with an added kvm_arch_sched_out() to provide symmetry, the x86 code 
> is
> kludgy, and even the common code is a bit confusing as it's not super obvious
> that kvm_sched_{in,out}() is really just kvm_arch_vcpu_{load,put}().
> 
> Staring a bit more at the vCPU flags we have, adding a "bool scheduled_out" 
> isn't
> terribly gross if it's done in common code and persists across load() and 
> put(),
> i.e. isn't so blatantly a temporary field.  And because it's easy, it could be
> set with WRITE_ONCE() so that if it can be read cross-task if there's ever a
> reason to do so.
> 
> The x86 code ends up being less ugly, and adding future arch/vendor code for
> sched_in() *or* sched_out() requires minimal churn, e.g. arch code doesn't 
> need
> to override kvm_arch_sched_in().
> 
> The only weird part is that vcpu->preempted and vcpu->ready have slightly
> different behavior, as they are cleared before kvm_arch_vcpu_load().  But the
> weirdness is really with those flags no having symmetry, not with 
> scheduled_out
> itself.
> 
> Thoughts?

Yeah, this seems reasonable. Perhaps scheduled_out could be a nice hint
for guardrails / sanity checks in the future.

-- 
Thanks,
Oliver


Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-01 Thread Guo Ren
On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
 wrote:
>
> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
>
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
>
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
>
> Signed-off-by: Stephen Brennan 
> ---
> Changes in v3:
>   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>   variable and check it directly in the kprobe handlers.
> Link to v1/v2 discussion:
>   
> https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/
>
>  arch/csky/kernel/probes/ftrace.c | 3 +++
>  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>  arch/parisc/kernel/ftrace.c  | 3 +++
>  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>  arch/riscv/kernel/probes/ftrace.c| 3 +++
>  arch/s390/kernel/ftrace.c| 3 +++
>  arch/x86/kernel/kprobes/ftrace.c | 3 +++
>  include/linux/kprobes.h  | 7 +++
>  kernel/kprobes.c | 6 ++
>  kernel/trace/ftrace.c| 1 +
>  10 files changed, 35 insertions(+)
>
> diff --git a/arch/csky/kernel/probes/ftrace.c 
> b/arch/csky/kernel/probes/ftrace.c
> index 834cffcfbce3..7ba4b98076de 100644
> --- a/arch/csky/kernel/probes/ftrace.c
> +++ b/arch/csky/kernel/probes/ftrace.c
> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
> struct kprobe_ctlblk *kcb;
> struct pt_regs *regs;
>
> +   if (unlikely(kprobe_ftrace_disabled))
> +   return;
> +
For csky part.
Acked-by: Guo Ren 

> bit = ftrace_test_recursion_trylock(ip, parent_ip);
> if (bit < 0)
> return;
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
> b/arch/loongarch/kernel/ftrace_dyn.c
> index 73858c9029cc..bff058317062 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
> struct kprobe *p;
> struct kprobe_ctlblk *kcb;
>
> +   if (unlikely(kprobe_ftrace_disabled))
> +   return;
> +
> bit = ftrace_test_recursion_trylock(ip, parent_ip);
> if (bit < 0)
> return;
> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> index 621a4b386ae4..c91f9c2e61ed 100644
> --- a/arch/parisc/kernel/ftrace.c
> +++ b/arch/parisc/kernel/ftrace.c
> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
> long parent_ip,
> struct kprobe *p;
> int bit;
>
> +   if (unlikely(kprobe_ftrace_disabled))
> +   return;
> +
> bit = ftrace_test_recursion_trylock(ip, parent_ip);
> if (bit < 0)
> return;
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
> b/arch/powerpc/kernel/kprobes-ftrace.c
> index 072ebe7f290b..f8208c027148 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
> parent_nip,
> struct pt_regs *regs;
> int bit;
>
> +   if (unlikely(kprobe_ftrace_disabled))
> +   return;
> +
> bit = ftrace_test_recursion_trylock(nip, parent_nip);
> if (bit < 0)
> return;
> diff --git a/arch/riscv/kernel/probes/ftrace.c 
> b/arch/riscv/kernel/probes/ftrace.c
> index 7142ec42e889..a69dfa610aa8 100644
> --- a/arch/riscv/kernel/probes/ftrace.c
> +++ b/arch/riscv/kernel/probes/ftrace.c
> @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
> parent_ip,
> struct kprobe_ctlblk *kcb;
> int bit;
>
> +   if (unlikely(kprobe_ftrace_disabled))
> +   return;
> +
> bit = 

Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-01 Thread Steven Rostedt
On Wed,  1 May 2024 09:29:56 -0700
Stephen Brennan  wrote:

> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
> 
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> 
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
> 
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
> 
> Signed-off-by: Stephen Brennan 
> ---
> Changes in v3:
>   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>   variable and check it directly in the kprobe handlers.

Reviewed-by: Steven Rostedt (Google) 

Thanks,

-- Steve


[PATCH v3] kprobe/ftrace: bail out if ftrace was killed

2024-05-01 Thread Stephen Brennan
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.

This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:

[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer

  sudo perf probe --add commit_creds
  sudo perf trace -e probe:commit_creds
  # In another terminal
  make
  sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
  # Back to perf terminal
  # ctrl-c
  sudo perf probe --del commit_creds

After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.

Signed-off-by: Stephen Brennan 
---
Changes in v3:
  Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
  variable and check it directly in the kprobe handlers.
Link to v1/v2 discussion:
  
https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/

 arch/csky/kernel/probes/ftrace.c | 3 +++
 arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
 arch/parisc/kernel/ftrace.c  | 3 +++
 arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
 arch/riscv/kernel/probes/ftrace.c| 3 +++
 arch/s390/kernel/ftrace.c| 3 +++
 arch/x86/kernel/kprobes/ftrace.c | 3 +++
 include/linux/kprobes.h  | 7 +++
 kernel/kprobes.c | 6 ++
 kernel/trace/ftrace.c| 1 +
 10 files changed, 35 insertions(+)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 834cffcfbce3..7ba4b98076de 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
struct pt_regs *regs;
 
+   if (unlikely(kprobe_ftrace_disabled))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..bff058317062 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
struct kprobe_ctlblk *kcb;
 
+   if (unlikely(kprobe_ftrace_disabled))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 621a4b386ae4..c91f9c2e61ed 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
int bit;
 
+   if (unlikely(kprobe_ftrace_disabled))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 072ebe7f290b..f8208c027148 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
struct pt_regs *regs;
int bit;
 
+   if (unlikely(kprobe_ftrace_disabled))
+   return;
+
bit = ftrace_test_recursion_trylock(nip, parent_nip);
if (bit < 0)
return;
diff --git a/arch/riscv/kernel/probes/ftrace.c 
b/arch/riscv/kernel/probes/ftrace.c
index 7142ec42e889..a69dfa610aa8 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
int bit;
 
+   if (unlikely(kprobe_ftrace_disabled))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..7f6f8c438c26 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
   

Re: [PATCH] powerpc: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n

2024-05-01 Thread Nathan Chancellor
Hi Michael,

On Wed, May 01, 2024 at 12:04:40AM +1000, Michael Ellerman wrote:
> With -Wextra clang warns about pointer arithmetic using a null pointer.
> When building with CONFIG_PCI=n, that triggers a warning in the IO
> accessors, eg:
> 
>   In file included from linux/arch/powerpc/include/asm/io.h:672:
>   linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer 
> arithmetic on a null pointer has undefined behavior 
> [-Wnull-pointer-arithmetic]
>  23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
> | ^~~~
>   ...
>   linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro 
> '__do_inb'
> 591 | #define __do_inb(port)  readb((PCI_IO_ADDR)_IO_BASE + port);
> |   ~ ^
> 
> That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.
> 
> There is code that builds with calls to IO accessors even when
> CONFIG_PCI=n, but the actual calls are guarded by runtime checks.
> If not those calls would be faulting, because the page at virtual
> address zero is (usually) not mapped into the kernel. As Arnd pointed
> out, it is possible a large port value could cause the address to be
> above mmap_min_addr which would then access userspace, which would be
> a bug.
> 
> To avoid any such issues, and also fix the compiler warning, set the
> _IO_BASE to POISON_POINTER_DELTA. That is a value chosen to point into
> unmapped space between the kernel and userspace, so any access will
> always fault.
> 
> Reported-by: Naresh Kamboju 
> Closes: 
> https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaa...@mail.gmail.com
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/io.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index 08c550ed49be..1cd6eb6c8101 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -37,7 +37,7 @@ extern struct pci_dev *isa_bridge_pcidev;
>   * define properly based on the platform
>   */
>  #ifndef CONFIG_PCI
> -#define _IO_BASE 0
> +#define _IO_BASE POISON_POINTER_DELTA

This works for CONFIG_PPC64 but not CONFIG_PPC32 (so tinyconfig and
allnoconfig like Naresh reported) because CONFIG_ILLEGAL_POINTER_VALUE
is defined as 0 in that case.

  $ grep -P 'CONFIG_(ILLEGAL_POINTER_VALUE|PCI|PPC32)' .config
  CONFIG_PPC32=y
  CONFIG_ILLEGAL_POINTER_VALUE=0

>  #define _ISA_MEM_BASE0
>  #define PCI_DRAM_OFFSET 0
>  #elif defined(CONFIG_PPC32)
> -- 
> 2.44.0
> 


Re: [PATCH 0/4] KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()

2024-05-01 Thread Sean Christopherson
On Wed, May 01, 2024, Oliver Upton wrote:
> On Tue, Apr 30, 2024 at 12:31:53PM -0700, Sean Christopherson wrote:
> > Drop kvm_arch_sched_in() and instead pass a @sched_in boolean to
> > kvm_arch_vcpu_load().
> > 
> > While fiddling with an idea for optimizing state management on AMD CPUs,
> > I wanted to skip re-saving certain host state when a vCPU is scheduled back
> > in, as the state (theoretically) shouldn't change for the task while it's
> > scheduled out.  Actually doing that was annoying and unnecessarily brittle
> > due to having a separate API for the kvm_sched_in() case (the state save
> > needed to be in kvm_arch_vcpu_load() for the common path).
> > 
> > E.g. I could have set a "temporary"-ish flag somewhere in kvm_vcpu, but (a)
> > that's gross and (b) it would rely on the arbitrary ordering between
> > sched_in() and vcpu_load() staying the same.
> 
> Another option would be to change the rules around kvm_arch_sched_in()
> where the callee is expected to load the vCPU context.
> 
> The default implementation could just call kvm_arch_vcpu_load() directly
> and the x86 implementation can order things the way it wants before
> kvm_arch_vcpu_load().
> 
> I say this because ...
> 
> > The only real downside I see is that arm64 and riscv end up having to pass
> > "false" for their direct usage of kvm_arch_vcpu_load(), and passing boolean
> > literals isn't ideal.  But that can be solved by adding an inner helper that
> > omits the @sched_in param (I almost added a patch to do that, but I couldn't
> > convince myself it was necessary).
> 
> Needing to pass @sched_in for other usage of kvm_arch_vcpu_load() hurts
> readability, especially when no other architecture besides x86 cares
> about it.

Yeah, that bothers me too.

I tried your suggestion of having x86's kvm_arch_sched_in() do 
kvm_arch_vcpu_load(),
and even with an added kvm_arch_sched_out() to provide symmetry, the x86 code is
kludgy, and even the common code is a bit confusing as it's not super obvious
that kvm_sched_{in,out}() is really just kvm_arch_vcpu_{load,put}().

Staring a bit more at the vCPU flags we have, adding a "bool scheduled_out" 
isn't
terribly gross if it's done in common code and persists across load() and put(),
i.e. isn't so blatantly a temporary field.  And because it's easy, it could be
set with WRITE_ONCE() so that if it can be read cross-task if there's ever a
reason to do so.

The x86 code ends up being less ugly, and adding future arch/vendor code for
sched_in() *or* sched_out() requires minimal churn, e.g. arch code doesn't need
to override kvm_arch_sched_in().

The only weird part is that vcpu->preempted and vcpu->ready have slightly
different behavior, as they are cleared before kvm_arch_vcpu_load().  But the
weirdness is really with those flags no having symmetry, not with scheduled_out
itself.

Thoughts?

static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

WRITE_ONCE(vcpu->preempted, false);
WRITE_ONCE(vcpu->ready, false);

__this_cpu_write(kvm_running_vcpu, vcpu);
kvm_arch_vcpu_load(vcpu, cpu);

WRITE_ONCE(vcpu->scheduled_out, false);
}

static void kvm_sched_out(struct preempt_notifier *pn,
  struct task_struct *next)
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

WRITE_ONCE(vcpu->scheduled_out, true);

if (current->on_rq) {
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
}
kvm_arch_vcpu_put(vcpu);
__this_cpu_write(kvm_running_vcpu, NULL);
}


Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

2024-05-01 Thread Jason Gunthorpe
On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote:
> RFC v1 was posted here [1]. As I was testing more and fixing the
> issues, I realized its clean to have the table_group_ops implemented
> the way it is done on PowerNV and stop 'borrowing' the DMA windows
> for pSeries.
> 
> This patch-set implements the iommu table_group_ops for pSeries for
> VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER
> pSeries machines.

Wait, did they previously not have any support?

Again, this TCE stuff needs to go away, not grow. I can grudgingly
accept fixing it where it used to work, but not enabling more HW that
never worked before! :(

Jason


[powerpc:next-test] BUILD SUCCESS 236a4c63491784ae4814100cca47bc3645c776df

2024-05-01 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 236a4c63491784ae4814100cca47bc3645c776df  powerpc: Mark 
memory_limit as initdata

elapsed time: 1455m

configs tested: 146
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc  axs103_defconfig   gcc  
arc defconfig   gcc  
arc   randconfig-001-20240501   gcc  
arc   randconfig-002-20240501   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   clang
arm  allyesconfig   gcc  
arm defconfig   clang
arm   randconfig-001-20240501   gcc  
arm   randconfig-002-20240501   gcc  
arm   randconfig-004-20240501   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64allyesconfig   clang
arm64   defconfig   gcc  
arm64 randconfig-001-20240501   gcc  
arm64 randconfig-002-20240501   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20240501   gcc  
csky  randconfig-002-20240501   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
i386 allmodconfig   gcc  
i386  allnoconfig   gcc  
i386 allyesconfig   gcc  
i386 buildonly-randconfig-002-20240501   gcc  
i386 buildonly-randconfig-004-20240501   gcc  
i386 buildonly-randconfig-005-20240501   gcc  
i386defconfig   clang
i386  randconfig-001-20240501   gcc  
i386  randconfig-003-20240501   gcc  
i386  randconfig-004-20240501   gcc  
i386  randconfig-011-20240501   gcc  
i386  randconfig-012-20240501   gcc  
i386  randconfig-013-20240501   gcc  
i386  randconfig-014-20240501   gcc  
i386  randconfig-015-20240501   gcc  
i386  randconfig-016-20240501   gcc  
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch randconfig-001-20240501   gcc  
loongarch randconfig-002-20240501   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k   sun3_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   gcc  
mips allyesconfig   gcc  
mips  ath79_defconfig   gcc  
mips cu1830-neo_defconfig   gcc  
mips  maltasmvp_eva_defconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20240501   gcc  
nios2 randconfig-002-20240501   gcc  
openrisc allmodconfig   gcc  
openrisc  allnoconfig   gcc  
openriscdefconfig   gcc  
pariscallnoconfig   gcc  
parisc  defconfig   gcc  
pariscrandconfig-001-20240501   gcc  
pariscrandconfig-002-20240501   gcc  
parisc64defconfig   gcc  
powerpc   allnoconfig   gcc  
powerpc

Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info

2024-05-01 Thread Jonathan Cameron
On Sun, 28 Apr 2024 03:31:11 +
"Duan, Zhenzhong"  wrote:

> Hi Jonathan,
> 
> >-Original Message-
> >From: Jonathan Cameron 
> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might
> >be ANFE in aer_err_info
> >
> >On Tue, 23 Apr 2024 02:25:05 +
> >"Duan, Zhenzhong"  wrote:
> >  
> >> >-Original Message-
> >> >From: Jonathan Cameron 
> >> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that  
> >might  
> >> >be ANFE in aer_err_info
> >> >
> >> >On Wed, 17 Apr 2024 14:14:05 +0800
> >> >Zhenzhong Duan  wrote:
> >> >  
> >> >> In some cases the detector of a Non-Fatal Error(NFE) is not the most
> >> >> appropriate agent to determine the type of the error. For example,
> >> >> when software performs a configuration read from a non-existent
> >> >> device or Function, completer will send an ERR_NONFATAL Message.
> >> >> On some platforms, ERR_NONFATAL results in a System Error, which
> >> >> breaks normal software probing.
> >> >>
> >> >> Advisory Non-Fatal Error(ANFE) is a special case that can be used
> >> >> in above scenario. It is predominantly determined by the role of the
> >> >> detecting agent (Requester, Completer, or Receiver) and the specific
> >> >> error. In such cases, an agent with AER signals the NFE (if enabled)
> >> >> by sending an ERR_COR Message as an advisory to software, instead of
> >> >> sending ERR_NONFATAL.
> >> >>
> >> >> When processing an ANFE, ideally both correctable error(CE) status and
> >> >> uncorrectable error(UE) status should be cleared. However, there is no
> >> >> way to fully identify the UE associated with ANFE. Even worse, a Fatal
> >> >> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as
> >> >> ANFE. Treating an ANFE as NFE will reproduce above mentioned issue,
> >> >> i.e., breaking softwore probing; treating NFE as ANFE will make us
> >> >> ignoring some UEs which need active recover operation. To avoid  
> >clearing  
> >> >> UEs that are not ANFE by accident, the most conservative route is taken
> >> >> here: If any of the FE/NFE Detected bits is set in Device Status, do not
> >> >> touch UE status, they should be cleared later by the UE handler.  
> >Otherwise,  
> >> >> a specific set of UEs that may be raised as ANFE according to the PCIe
> >> >> specification will be cleared if their corresponding severity is 
> >> >> Non-Fatal.
> >> >>
> >> >> To achieve above purpose, store UNCOR_STATUS bits that might be  
> >ANFE  
> >> >> in aer_err_info.anfe_status. So that those bits could be printed and
> >> >> processed later.
> >> >>
> >> >> Tested-by: Yudong Wang 
> >> >> Co-developed-by: "Wang, Qingshun" 
> >> >> Signed-off-by: "Wang, Qingshun" 
> >> >> Signed-off-by: Zhenzhong Duan 
> >> >> ---
> >> >>  drivers/pci/pci.h  |  1 +
> >> >>  drivers/pci/pcie/aer.c | 45  
> >> >++  
> >> >>  2 files changed, 46 insertions(+)
> >> >>
> >> >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >> >> index 17fed1846847..3f9eb807f9fd 100644
> >> >> --- a/drivers/pci/pci.h
> >> >> +++ b/drivers/pci/pci.h
> >> >> @@ -412,6 +412,7 @@ struct aer_err_info {
> >> >>
> >> >> unsigned int status;/* COR/UNCOR Error Status */
> >> >> unsigned int mask;  /* COR/UNCOR Error Mask */
> >> >> +   unsigned int anfe_status;   /* UNCOR Error Status for ANFE 
> >> >> */
> >> >> struct pcie_tlp_log tlp;/* TLP Header */
> >> >>  };
> >> >>
> >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> >> index ac6293c24976..27364ab4b148 100644
> >> >> --- a/drivers/pci/pcie/aer.c
> >> >> +++ b/drivers/pci/pcie/aer.c
> >> >> @@ -107,6 +107,12 @@ struct aer_stats {
> >> >> PCI_ERR_ROOT_MULTI_COR_RCV |  
> >> >  \  
> >> >> PCI_ERR_ROOT_MULTI_UNCOR_RCV)
> >> >>
> >> >> +#define AER_ERR_ANFE_UNC_MASK  
> >> >  (PCI_ERR_UNC_POISON_TLP |   \  
> >> >> +   PCI_ERR_UNC_COMP_TIME |  
> >> >  \  
> >> >> +   PCI_ERR_UNC_COMP_ABORT |  
> >> >  \  
> >> >> +   PCI_ERR_UNC_UNX_COMP |  
> >> >  \  
> >> >> +   PCI_ERR_UNC_UNSUP)
> >> >> +
> >> >>  static int pcie_aer_disable;
> >> >>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
> >> >>
> >> >> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain,  
> >unsigned  
> >> >int bus, unsigned int devfn,  
> >> >>  EXPORT_SYMBOL_GPL(aer_recover_queue);
> >> >>  #endif
> >> >>
> >> >> +static void anfe_get_uc_status(struct pci_dev *dev, struct  
> >aer_err_info  
> >> >*info)  
> >> >> +{
> >> >> +   u32 uncor_mask, uncor_status;
> >> >> +   u16 device_status;
> >> >> +   int aer = dev->aer_cap;
> >> >> +
> >> >> +   if (pcie_capability_read_word(dev, PCI_EXP_DEVSTA,  
> >> >_status))  
> >> >> +   

Re: [PATCH 1/1] ALSA: aoa: soundbus: i2sbus: pcm: use 'time_left' variable with wait_for_completion_timeout()

2024-05-01 Thread Takashi Iwai
On Tue, 30 Apr 2024 14:10:27 +0200,
Wolfram Sang wrote:
> 
> There is a confusing pattern in the kernel to use a variable named 'timeout' 
> to
> store the result of wait_for_completion_timeout() causing patterns like:
> 
>   timeout = wait_for_completion_timeout(...)
>   if (!timeout) return -ETIMEDOUT;
> 
> with all kinds of permutations. Use 'time_left' as a variable to make the code
> self explaining.
> 
> Fix to the proper variable type 'unsigned long' while here.
> 
> Signed-off-by: Wolfram Sang 

Thanks, applied now to for-next branch.


Takashi


Re: [PATCH 2/2] radix/kfence: support late __kfence_pool allocation

2024-05-01 Thread IBM
Hari Bathini  writes:

> With commit b33f778bba5ef ("kfence: alloc kfence_pool after system
> startup"), KFENCE pool can be allocated after system startup via the
> page allocator. This can lead to problems as all memory is not mapped
> at page granularity anymore with CONFIG_KFENCE. Address this by direct
> mapping all memory at PMD level and split the mapping for PMD pages
> that overlap with __kfence_pool to page level granularity if and when
> __kfence_pool is allocated after system startup.
>
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h |  2 +
>  arch/powerpc/include/asm/kfence.h  | 14 +-
>  arch/powerpc/mm/book3s64/radix_pgtable.c   | 50 +-
>  3 files changed, 64 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
> b/arch/powerpc/include/asm/book3s/64/radix.h
> index 8f55ff74bb68..0423ddbcf73c 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -340,6 +340,8 @@ extern void radix__vmemmap_remove_mapping(unsigned long 
> start,
>  extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
>pgprot_t flags, unsigned int psz);
>  
> +extern bool radix_kfence_init_pool(void);
> +
>  static inline unsigned long radix__get_tree_size(void)
>  {
>   unsigned long rts_field;
> diff --git a/arch/powerpc/include/asm/kfence.h 
> b/arch/powerpc/include/asm/kfence.h
> index 18ec2b06ba1e..c5d2fb2f9ecb 100644
> --- a/arch/powerpc/include/asm/kfence.h
> +++ b/arch/powerpc/include/asm/kfence.h
> @@ -18,12 +18,24 @@
>  
>  #ifdef CONFIG_KFENCE
>  extern bool kfence_early_init;
> -#endif
> +
> +static inline bool kfence_alloc_pool_late(void)
> +{
> + return !kfence_early_init;
> +}

Minor nit, but do we need kfence_alloc_pool_late()?
The function name looks confusing. Can we not just use
!kfence_early_init? If not then maybe bool kfence_late_init?

>  
>  static inline bool arch_kfence_init_pool(void)
>  {
> +#ifdef CONFIG_PPC_BOOK3S_64
> + if (radix_enabled())
> + return radix_kfence_init_pool();

Can we directly check...
if (radix_enabled() && !kfence_early_init)
... instead of embedding the check inside radix_kfence_late_init_pool()

> +#endif
> +
>   return true;
>  }
> +#else
> +static inline bool kfence_alloc_pool_late(void) { return false; }
> +#endif
>  
>  #ifdef CONFIG_PPC64
>  static inline bool kfence_protect_page(unsigned long addr, bool protect)
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index fccbf92f279b..f4374e3e31e1 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -253,6 +253,53 @@ void radix__mark_initmem_nx(void)
>  }
>  #endif /* CONFIG_STRICT_KERNEL_RWX */
>  
> +#ifdef CONFIG_KFENCE
> +static inline int radix_split_pmd_page(pmd_t *pmd, unsigned long addr)
> +{
> + pte_t *pte = pte_alloc_one_kernel(_mm);
> + unsigned long pfn = PFN_DOWN(__pa(addr));

Minor nit. Since addr will always be page aligned, so maybe PHYS_PFN() is better
suited. Although it does not matter.

> + int i;
> +
> + if (!pte)
> + return -ENOMEM;
> +
> + for (i = 0; i < PTRS_PER_PTE; i++) {
> + __set_pte_at(_mm, addr, pte + i, pfn_pte(pfn + i, 
> PAGE_KERNEL), 0);
> + asm volatile("ptesync": : :"memory");
> + }

Maybe a comment above the loop on why __set_pte_at() is ok for late
kfence init? and why not pte_update()? [1]

[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/


> + pmd_populate_kernel(_mm, pmd, pte);
> +
> + flush_tlb_kernel_range(addr, addr + PMD_SIZE);
> + return 0;
> +}
> +
> +bool radix_kfence_init_pool(void)
> +{
> + unsigned int page_psize, pmd_psize;
> + unsigned long addr;
> + pmd_t *pmd;
> +
> + if (!kfence_alloc_pool_late())
> + return true;
> +
> + page_psize = shift_to_mmu_psize(PAGE_SHIFT);
> + pmd_psize = shift_to_mmu_psize(PMD_SHIFT);
> + for (addr = (unsigned long)__kfence_pool; is_kfence_address((void 
> *)addr);
> +  addr += PAGE_SIZE) {
> + pmd = pmd_off_k(addr);
> +
> + if (pmd_leaf(*pmd)) {
> + if (radix_split_pmd_page(pmd, addr & PMD_MASK))
> + return false;
> + update_page_count(pmd_psize, -1);
> + update_page_count(page_psize, PTRS_PER_PTE);
> + }
> + }
> +
> + return true;
> +}
> +#endif
> +
>  static inline void __meminit
>  print_mapping(unsigned long start, unsigned long end, unsigned long size, 
> bool exec)
>  {
> @@ -391,7 +438,8 @@ static void __init radix_init_pgtable(void)
>   continue;
>   }
>  
> - WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, 
> ~0UL));
> +