Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On 2/5/24 00:09, Jason Gunthorpe wrote: On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote: RFC v1 was posted here [1]. As I was testing more and fixing the issues, I realized its clean to have the table_group_ops implemented the way it is done on PowerNV and stop 'borrowing' the DMA windows for pSeries. This patch-set implements the iommu table_group_ops for pSeries for VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER pSeries machines. Wait, did they previously not have any support? > Again, this TCE stuff needs to go away, not grow. I can grudgingly accept fixing it where it used to work, but not enabling more HW that never worked before! :( This used to work when I tried last time 2+ years ago, not a new stuff. Thanks, -- Alexey
Re: [PATCH] powerpc: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n
Nathan Chancellor writes: > Hi Michael, > > On Wed, May 01, 2024 at 12:04:40AM +1000, Michael Ellerman wrote: ... >> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h >> index 08c550ed49be..1cd6eb6c8101 100644 >> --- a/arch/powerpc/include/asm/io.h >> +++ b/arch/powerpc/include/asm/io.h >> @@ -37,7 +37,7 @@ extern struct pci_dev *isa_bridge_pcidev; >> * define properly based on the platform >> */ >> #ifndef CONFIG_PCI >> -#define _IO_BASE0 >> +#define _IO_BASEPOISON_POINTER_DELTA > > This works for CONFIG_PPC64 but not CONFIG_PPC32 (so tinyconfig and > allnoconfig like Naresh reported) because CONFIG_ILLEGAL_POINTER_VALUE > is defined as 0 in that case. > > $ grep -P 'CONFIG_(ILLEGAL_POINTER_VALUE|PCI|PPC32)' .config > CONFIG_PPC32=y > CONFIG_ILLEGAL_POINTER_VALUE=0 :facepalm: Looks like we can fix the compiler warnings just by doing the arithmetic before casting to void *. I'll send a v2. cheers
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
On Thu, 2 May 2024 01:35:16 +0800 Guo Ren wrote: > On Thu, May 2, 2024 at 12:30 AM Stephen Brennan > wrote: > > > > If an error happens in ftrace, ftrace_kill() will prevent disarming > > kprobes. Eventually, the ftrace_ops associated with the kprobes will be > > freed, yet the kprobes will still be active, and when triggered, they > > will use the freed memory, likely resulting in a page fault and panic. > > > > This behavior can be reproduced quite easily, by creating a kprobe and > > then triggering a ftrace_kill(). For simplicity, we can simulate an > > ftrace error with a kernel module like [1]: > > > > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer > > > > sudo perf probe --add commit_creds > > sudo perf trace -e probe:commit_creds > > # In another terminal > > make > > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug > > # Back to perf terminal > > # ctrl-c > > sudo perf probe --del commit_creds > > > > After a short period, a page fault and panic would occur as the kprobe > > continues to execute and uses the freed ftrace_ops. While ftrace_kill() > > is supposed to be used only in extreme circumstances, it is invoked in > > FTRACE_WARN_ON() and so there are many places where an unexpected bug > > could be triggered, yet the system may continue operating, possibly > > without the administrator noticing. If ftrace_kill() does not panic the > > system, then we should do everything we can to continue operating, > > rather than leave a ticking time bomb. > > > > Signed-off-by: Stephen Brennan > > --- > > Changes in v3: > > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" > > variable and check it directly in the kprobe handlers. > > Link to v1/v2 discussion: > > > > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ > > > > arch/csky/kernel/probes/ftrace.c | 3 +++ > > arch/loongarch/kernel/ftrace_dyn.c | 3 +++ > > arch/parisc/kernel/ftrace.c | 3 +++ > > arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ > > arch/riscv/kernel/probes/ftrace.c| 3 +++ > > arch/s390/kernel/ftrace.c| 3 +++ > > arch/x86/kernel/kprobes/ftrace.c | 3 +++ > > include/linux/kprobes.h | 7 +++ > > kernel/kprobes.c | 6 ++ > > kernel/trace/ftrace.c| 1 + > > 10 files changed, 35 insertions(+) > > > > diff --git a/arch/csky/kernel/probes/ftrace.c > > b/arch/csky/kernel/probes/ftrace.c > > index 834cffcfbce3..7ba4b98076de 100644 > > --- a/arch/csky/kernel/probes/ftrace.c > > +++ b/arch/csky/kernel/probes/ftrace.c > > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > > long parent_ip, > > struct kprobe_ctlblk *kcb; > > struct pt_regs *regs; > > > > + if (unlikely(kprobe_ftrace_disabled)) > > + return; > > + > For csky part. > Acked-by: Guo Ren Thanks Stephen, Guo and Steve! Let me pick this to probes/for-next! Thank you, > > > bit = ftrace_test_recursion_trylock(ip, parent_ip); > > if (bit < 0) > > return; > > diff --git a/arch/loongarch/kernel/ftrace_dyn.c > > b/arch/loongarch/kernel/ftrace_dyn.c > > index 73858c9029cc..bff058317062 100644 > > --- a/arch/loongarch/kernel/ftrace_dyn.c > > +++ b/arch/loongarch/kernel/ftrace_dyn.c > > @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > > long parent_ip, > > struct kprobe *p; > > struct kprobe_ctlblk *kcb; > > > > + if (unlikely(kprobe_ftrace_disabled)) > > + return; > > + > > bit = ftrace_test_recursion_trylock(ip, parent_ip); > > if (bit < 0) > > return; > > diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c > > index 621a4b386ae4..c91f9c2e61ed 100644 > > --- a/arch/parisc/kernel/ftrace.c > > +++ b/arch/parisc/kernel/ftrace.c > > @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > > long parent_ip, > > struct kprobe *p; > > int bit; > > > > + if (unlikely(kprobe_ftrace_disabled)) > > + return; > > + > > bit = ftrace_test_recursion_trylock(ip, parent_ip); > > if (bit < 0) > > return; > > diff --git a/arch/powerpc/kernel/kprobes-ftrace.c > > b/arch/powerpc/kernel/kprobes-ftrace.c > > index 072ebe7f290b..f8208c027148 100644 > > --- a/arch/powerpc/kernel/kprobes-ftrace.c > > +++ b/arch/powerpc/kernel/kprobes-ftrace.c > > @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned > > long parent_nip, > > struct pt_regs *regs; > > int bit; > > > > + if (unlikely(kprobe_ftrace_disabled)) > > + return; > > + > > bit = ftrace_test_recursion_trylock(nip, parent_nip); > > if (bit < 0) > > return; > > diff --git a/arch/riscv/kernel/probes/ftrace.c > > b/arch/riscv/kernel/probes/ftrace.c > > index
Re: [PATCH 0/4] KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()
On Wed, May 01, 2024 at 07:28:21AM -0700, Sean Christopherson wrote: > On Wed, May 01, 2024, Oliver Upton wrote: > > On Tue, Apr 30, 2024 at 12:31:53PM -0700, Sean Christopherson wrote: > > > Drop kvm_arch_sched_in() and instead pass a @sched_in boolean to > > > kvm_arch_vcpu_load(). > > > > > > While fiddling with an idea for optimizing state management on AMD CPUs, > > > I wanted to skip re-saving certain host state when a vCPU is scheduled > > > back > > > in, as the state (theoretically) shouldn't change for the task while it's > > > scheduled out. Actually doing that was annoying and unnecessarily brittle > > > due to having a separate API for the kvm_sched_in() case (the state save > > > needed to be in kvm_arch_vcpu_load() for the common path). > > > > > > E.g. I could have set a "temporary"-ish flag somewhere in kvm_vcpu, but > > > (a) > > > that's gross and (b) it would rely on the arbitrary ordering between > > > sched_in() and vcpu_load() staying the same. > > > > Another option would be to change the rules around kvm_arch_sched_in() > > where the callee is expected to load the vCPU context. > > > > The default implementation could just call kvm_arch_vcpu_load() directly > > and the x86 implementation can order things the way it wants before > > kvm_arch_vcpu_load(). > > > > I say this because ... > > > > > The only real downside I see is that arm64 and riscv end up having to pass > > > "false" for their direct usage of kvm_arch_vcpu_load(), and passing > > > boolean > > > literals isn't ideal. But that can be solved by adding an inner helper > > > that > > > omits the @sched_in param (I almost added a patch to do that, but I > > > couldn't > > > convince myself it was necessary). > > > > Needing to pass @sched_in for other usage of kvm_arch_vcpu_load() hurts > > readability, especially when no other architecture besides x86 cares > > about it. > > Yeah, that bothers me too. > > I tried your suggestion of having x86's kvm_arch_sched_in() do > kvm_arch_vcpu_load(), > and even with an added kvm_arch_sched_out() to provide symmetry, the x86 code > is > kludgy, and even the common code is a bit confusing as it's not super obvious > that kvm_sched_{in,out}() is really just kvm_arch_vcpu_{load,put}(). > > Staring a bit more at the vCPU flags we have, adding a "bool scheduled_out" > isn't > terribly gross if it's done in common code and persists across load() and > put(), > i.e. isn't so blatantly a temporary field. And because it's easy, it could be > set with WRITE_ONCE() so that if it can be read cross-task if there's ever a > reason to do so. > > The x86 code ends up being less ugly, and adding future arch/vendor code for > sched_in() *or* sched_out() requires minimal churn, e.g. arch code doesn't > need > to override kvm_arch_sched_in(). > > The only weird part is that vcpu->preempted and vcpu->ready have slightly > different behavior, as they are cleared before kvm_arch_vcpu_load(). But the > weirdness is really with those flags no having symmetry, not with > scheduled_out > itself. > > Thoughts? Yeah, this seems reasonable. Perhaps scheduled_out could be a nice hint for guardrails / sanity checks in the future. -- Thanks, Oliver
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
On Thu, May 2, 2024 at 12:30 AM Stephen Brennan wrote: > > If an error happens in ftrace, ftrace_kill() will prevent disarming > kprobes. Eventually, the ftrace_ops associated with the kprobes will be > freed, yet the kprobes will still be active, and when triggered, they > will use the freed memory, likely resulting in a page fault and panic. > > This behavior can be reproduced quite easily, by creating a kprobe and > then triggering a ftrace_kill(). For simplicity, we can simulate an > ftrace error with a kernel module like [1]: > > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer > > sudo perf probe --add commit_creds > sudo perf trace -e probe:commit_creds > # In another terminal > make > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug > # Back to perf terminal > # ctrl-c > sudo perf probe --del commit_creds > > After a short period, a page fault and panic would occur as the kprobe > continues to execute and uses the freed ftrace_ops. While ftrace_kill() > is supposed to be used only in extreme circumstances, it is invoked in > FTRACE_WARN_ON() and so there are many places where an unexpected bug > could be triggered, yet the system may continue operating, possibly > without the administrator noticing. If ftrace_kill() does not panic the > system, then we should do everything we can to continue operating, > rather than leave a ticking time bomb. > > Signed-off-by: Stephen Brennan > --- > Changes in v3: > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" > variable and check it directly in the kprobe handlers. > Link to v1/v2 discussion: > > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ > > arch/csky/kernel/probes/ftrace.c | 3 +++ > arch/loongarch/kernel/ftrace_dyn.c | 3 +++ > arch/parisc/kernel/ftrace.c | 3 +++ > arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ > arch/riscv/kernel/probes/ftrace.c| 3 +++ > arch/s390/kernel/ftrace.c| 3 +++ > arch/x86/kernel/kprobes/ftrace.c | 3 +++ > include/linux/kprobes.h | 7 +++ > kernel/kprobes.c | 6 ++ > kernel/trace/ftrace.c| 1 + > 10 files changed, 35 insertions(+) > > diff --git a/arch/csky/kernel/probes/ftrace.c > b/arch/csky/kernel/probes/ftrace.c > index 834cffcfbce3..7ba4b98076de 100644 > --- a/arch/csky/kernel/probes/ftrace.c > +++ b/arch/csky/kernel/probes/ftrace.c > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long > parent_ip, > struct kprobe_ctlblk *kcb; > struct pt_regs *regs; > > + if (unlikely(kprobe_ftrace_disabled)) > + return; > + For csky part. Acked-by: Guo Ren > bit = ftrace_test_recursion_trylock(ip, parent_ip); > if (bit < 0) > return; > diff --git a/arch/loongarch/kernel/ftrace_dyn.c > b/arch/loongarch/kernel/ftrace_dyn.c > index 73858c9029cc..bff058317062 100644 > --- a/arch/loongarch/kernel/ftrace_dyn.c > +++ b/arch/loongarch/kernel/ftrace_dyn.c > @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > long parent_ip, > struct kprobe *p; > struct kprobe_ctlblk *kcb; > > + if (unlikely(kprobe_ftrace_disabled)) > + return; > + > bit = ftrace_test_recursion_trylock(ip, parent_ip); > if (bit < 0) > return; > diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c > index 621a4b386ae4..c91f9c2e61ed 100644 > --- a/arch/parisc/kernel/ftrace.c > +++ b/arch/parisc/kernel/ftrace.c > @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned > long parent_ip, > struct kprobe *p; > int bit; > > + if (unlikely(kprobe_ftrace_disabled)) > + return; > + > bit = ftrace_test_recursion_trylock(ip, parent_ip); > if (bit < 0) > return; > diff --git a/arch/powerpc/kernel/kprobes-ftrace.c > b/arch/powerpc/kernel/kprobes-ftrace.c > index 072ebe7f290b..f8208c027148 100644 > --- a/arch/powerpc/kernel/kprobes-ftrace.c > +++ b/arch/powerpc/kernel/kprobes-ftrace.c > @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long > parent_nip, > struct pt_regs *regs; > int bit; > > + if (unlikely(kprobe_ftrace_disabled)) > + return; > + > bit = ftrace_test_recursion_trylock(nip, parent_nip); > if (bit < 0) > return; > diff --git a/arch/riscv/kernel/probes/ftrace.c > b/arch/riscv/kernel/probes/ftrace.c > index 7142ec42e889..a69dfa610aa8 100644 > --- a/arch/riscv/kernel/probes/ftrace.c > +++ b/arch/riscv/kernel/probes/ftrace.c > @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long > parent_ip, > struct kprobe_ctlblk *kcb; > int bit; > > + if (unlikely(kprobe_ftrace_disabled)) > + return; > + > bit =
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
On Wed, 1 May 2024 09:29:56 -0700 Stephen Brennan wrote: > If an error happens in ftrace, ftrace_kill() will prevent disarming > kprobes. Eventually, the ftrace_ops associated with the kprobes will be > freed, yet the kprobes will still be active, and when triggered, they > will use the freed memory, likely resulting in a page fault and panic. > > This behavior can be reproduced quite easily, by creating a kprobe and > then triggering a ftrace_kill(). For simplicity, we can simulate an > ftrace error with a kernel module like [1]: > > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer > > sudo perf probe --add commit_creds > sudo perf trace -e probe:commit_creds > # In another terminal > make > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug > # Back to perf terminal > # ctrl-c > sudo perf probe --del commit_creds > > After a short period, a page fault and panic would occur as the kprobe > continues to execute and uses the freed ftrace_ops. While ftrace_kill() > is supposed to be used only in extreme circumstances, it is invoked in > FTRACE_WARN_ON() and so there are many places where an unexpected bug > could be triggered, yet the system may continue operating, possibly > without the administrator noticing. If ftrace_kill() does not panic the > system, then we should do everything we can to continue operating, > rather than leave a ticking time bomb. > > Signed-off-by: Stephen Brennan > --- > Changes in v3: > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" > variable and check it directly in the kprobe handlers. Reviewed-by: Steven Rostedt (Google) Thanks, -- Steve
[PATCH v3] kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Signed-off-by: Stephen Brennan --- Changes in v3: Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" variable and check it directly in the kprobe handlers. Link to v1/v2 discussion: https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ arch/csky/kernel/probes/ftrace.c | 3 +++ arch/loongarch/kernel/ftrace_dyn.c | 3 +++ arch/parisc/kernel/ftrace.c | 3 +++ arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ arch/riscv/kernel/probes/ftrace.c| 3 +++ arch/s390/kernel/ftrace.c| 3 +++ arch/x86/kernel/kprobes/ftrace.c | 3 +++ include/linux/kprobes.h | 7 +++ kernel/kprobes.c | 6 ++ kernel/trace/ftrace.c| 1 + 10 files changed, 35 insertions(+) diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c index 834cffcfbce3..7ba4b98076de 100644 --- a/arch/csky/kernel/probes/ftrace.c +++ b/arch/csky/kernel/probes/ftrace.c @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; struct pt_regs *regs; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c index 73858c9029cc..bff058317062 100644 --- a/arch/loongarch/kernel/ftrace_dyn.c +++ b/arch/loongarch/kernel/ftrace_dyn.c @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; struct kprobe_ctlblk *kcb; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 621a4b386ae4..c91f9c2e61ed 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 072ebe7f290b..f8208c027148 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, struct pt_regs *regs; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(nip, parent_nip); if (bit < 0) return; diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c index 7142ec42e889..a69dfa610aa8 100644 --- a/arch/riscv/kernel/probes/ftrace.c +++ b/arch/riscv/kernel/probes/ftrace.c @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..7f6f8c438c26 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
Re: [PATCH] powerpc: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n
Hi Michael, On Wed, May 01, 2024 at 12:04:40AM +1000, Michael Ellerman wrote: > With -Wextra clang warns about pointer arithmetic using a null pointer. > When building with CONFIG_PCI=n, that triggers a warning in the IO > accessors, eg: > > In file included from linux/arch/powerpc/include/asm/io.h:672: > linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer > arithmetic on a null pointer has undefined behavior > [-Wnull-pointer-arithmetic] > 23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port) > | ^~~~ > ... > linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro > '__do_inb' > 591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port); > | ~ ^ > > That is because when CONFIG_PCI=n, _IO_BASE is defined as 0. > > There is code that builds with calls to IO accessors even when > CONFIG_PCI=n, but the actual calls are guarded by runtime checks. > If not those calls would be faulting, because the page at virtual > address zero is (usually) not mapped into the kernel. As Arnd pointed > out, it is possible a large port value could cause the address to be > above mmap_min_addr which would then access userspace, which would be > a bug. > > To avoid any such issues, and also fix the compiler warning, set the > _IO_BASE to POISON_POINTER_DELTA. That is a value chosen to point into > unmapped space between the kernel and userspace, so any access will > always fault. > > Reported-by: Naresh Kamboju > Closes: > https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaa...@mail.gmail.com > Signed-off-by: Michael Ellerman > --- > arch/powerpc/include/asm/io.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index 08c550ed49be..1cd6eb6c8101 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -37,7 +37,7 @@ extern struct pci_dev *isa_bridge_pcidev; > * define properly based on the platform > */ > #ifndef CONFIG_PCI > -#define _IO_BASE 0 > +#define _IO_BASE POISON_POINTER_DELTA This works for CONFIG_PPC64 but not CONFIG_PPC32 (so tinyconfig and allnoconfig like Naresh reported) because CONFIG_ILLEGAL_POINTER_VALUE is defined as 0 in that case. $ grep -P 'CONFIG_(ILLEGAL_POINTER_VALUE|PCI|PPC32)' .config CONFIG_PPC32=y CONFIG_ILLEGAL_POINTER_VALUE=0 > #define _ISA_MEM_BASE0 > #define PCI_DRAM_OFFSET 0 > #elif defined(CONFIG_PPC32) > -- > 2.44.0 >
Re: [PATCH 0/4] KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()
On Wed, May 01, 2024, Oliver Upton wrote: > On Tue, Apr 30, 2024 at 12:31:53PM -0700, Sean Christopherson wrote: > > Drop kvm_arch_sched_in() and instead pass a @sched_in boolean to > > kvm_arch_vcpu_load(). > > > > While fiddling with an idea for optimizing state management on AMD CPUs, > > I wanted to skip re-saving certain host state when a vCPU is scheduled back > > in, as the state (theoretically) shouldn't change for the task while it's > > scheduled out. Actually doing that was annoying and unnecessarily brittle > > due to having a separate API for the kvm_sched_in() case (the state save > > needed to be in kvm_arch_vcpu_load() for the common path). > > > > E.g. I could have set a "temporary"-ish flag somewhere in kvm_vcpu, but (a) > > that's gross and (b) it would rely on the arbitrary ordering between > > sched_in() and vcpu_load() staying the same. > > Another option would be to change the rules around kvm_arch_sched_in() > where the callee is expected to load the vCPU context. > > The default implementation could just call kvm_arch_vcpu_load() directly > and the x86 implementation can order things the way it wants before > kvm_arch_vcpu_load(). > > I say this because ... > > > The only real downside I see is that arm64 and riscv end up having to pass > > "false" for their direct usage of kvm_arch_vcpu_load(), and passing boolean > > literals isn't ideal. But that can be solved by adding an inner helper that > > omits the @sched_in param (I almost added a patch to do that, but I couldn't > > convince myself it was necessary). > > Needing to pass @sched_in for other usage of kvm_arch_vcpu_load() hurts > readability, especially when no other architecture besides x86 cares > about it. Yeah, that bothers me too. I tried your suggestion of having x86's kvm_arch_sched_in() do kvm_arch_vcpu_load(), and even with an added kvm_arch_sched_out() to provide symmetry, the x86 code is kludgy, and even the common code is a bit confusing as it's not super obvious that kvm_sched_{in,out}() is really just kvm_arch_vcpu_{load,put}(). Staring a bit more at the vCPU flags we have, adding a "bool scheduled_out" isn't terribly gross if it's done in common code and persists across load() and put(), i.e. isn't so blatantly a temporary field. And because it's easy, it could be set with WRITE_ONCE() so that if it can be read cross-task if there's ever a reason to do so. The x86 code ends up being less ugly, and adding future arch/vendor code for sched_in() *or* sched_out() requires minimal churn, e.g. arch code doesn't need to override kvm_arch_sched_in(). The only weird part is that vcpu->preempted and vcpu->ready have slightly different behavior, as they are cleared before kvm_arch_vcpu_load(). But the weirdness is really with those flags no having symmetry, not with scheduled_out itself. Thoughts? static void kvm_sched_in(struct preempt_notifier *pn, int cpu) { struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); WRITE_ONCE(vcpu->preempted, false); WRITE_ONCE(vcpu->ready, false); __this_cpu_write(kvm_running_vcpu, vcpu); kvm_arch_vcpu_load(vcpu, cpu); WRITE_ONCE(vcpu->scheduled_out, false); } static void kvm_sched_out(struct preempt_notifier *pn, struct task_struct *next) { struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); WRITE_ONCE(vcpu->scheduled_out, true); if (current->on_rq) { WRITE_ONCE(vcpu->preempted, true); WRITE_ONCE(vcpu->ready, true); } kvm_arch_vcpu_put(vcpu); __this_cpu_write(kvm_running_vcpu, NULL); }
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote: > RFC v1 was posted here [1]. As I was testing more and fixing the > issues, I realized its clean to have the table_group_ops implemented > the way it is done on PowerNV and stop 'borrowing' the DMA windows > for pSeries. > > This patch-set implements the iommu table_group_ops for pSeries for > VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER > pSeries machines. Wait, did they previously not have any support? Again, this TCE stuff needs to go away, not grow. I can grudgingly accept fixing it where it used to work, but not enabling more HW that never worked before! :( Jason
[powerpc:next-test] BUILD SUCCESS 236a4c63491784ae4814100cca47bc3645c776df
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: 236a4c63491784ae4814100cca47bc3645c776df powerpc: Mark memory_limit as initdata elapsed time: 1455m configs tested: 146 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfig gcc alphaallyesconfig gcc alpha defconfig gcc arc allmodconfig gcc arc allnoconfig gcc arc allyesconfig gcc arc axs103_defconfig gcc arc defconfig gcc arc randconfig-001-20240501 gcc arc randconfig-002-20240501 gcc arm allmodconfig gcc arm allnoconfig clang arm allyesconfig gcc arm defconfig clang arm randconfig-001-20240501 gcc arm randconfig-002-20240501 gcc arm randconfig-004-20240501 gcc arm64allmodconfig clang arm64 allnoconfig gcc arm64allyesconfig clang arm64 defconfig gcc arm64 randconfig-001-20240501 gcc arm64 randconfig-002-20240501 gcc csky allmodconfig gcc csky allnoconfig gcc csky allyesconfig gcc cskydefconfig gcc csky randconfig-001-20240501 gcc csky randconfig-002-20240501 gcc hexagon allmodconfig clang hexagon allnoconfig clang hexagon allyesconfig clang hexagon defconfig clang i386 allmodconfig gcc i386 allnoconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-002-20240501 gcc i386 buildonly-randconfig-004-20240501 gcc i386 buildonly-randconfig-005-20240501 gcc i386defconfig clang i386 randconfig-001-20240501 gcc i386 randconfig-003-20240501 gcc i386 randconfig-004-20240501 gcc i386 randconfig-011-20240501 gcc i386 randconfig-012-20240501 gcc i386 randconfig-013-20240501 gcc i386 randconfig-014-20240501 gcc i386 randconfig-015-20240501 gcc i386 randconfig-016-20240501 gcc loongarchallmodconfig gcc loongarch allnoconfig gcc loongarchallyesconfig gcc loongarch defconfig gcc loongarch randconfig-001-20240501 gcc loongarch randconfig-002-20240501 gcc m68k allmodconfig gcc m68k allnoconfig gcc m68k allyesconfig gcc m68kdefconfig gcc m68k sun3_defconfig gcc microblaze allmodconfig gcc microblazeallnoconfig gcc microblaze allyesconfig gcc microblaze defconfig gcc mips allmodconfig gcc mips allnoconfig gcc mips allyesconfig gcc mips ath79_defconfig gcc mips cu1830-neo_defconfig gcc mips maltasmvp_eva_defconfig gcc nios2allmodconfig gcc nios2 allnoconfig gcc nios2allyesconfig gcc nios2 defconfig gcc nios2 randconfig-001-20240501 gcc nios2 randconfig-002-20240501 gcc openrisc allmodconfig gcc openrisc allnoconfig gcc openriscdefconfig gcc pariscallnoconfig gcc parisc defconfig gcc pariscrandconfig-001-20240501 gcc pariscrandconfig-002-20240501 gcc parisc64defconfig gcc powerpc allnoconfig gcc powerpc
Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info
On Sun, 28 Apr 2024 03:31:11 + "Duan, Zhenzhong" wrote: > Hi Jonathan, > > >-Original Message- > >From: Jonathan Cameron > >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might > >be ANFE in aer_err_info > > > >On Tue, 23 Apr 2024 02:25:05 + > >"Duan, Zhenzhong" wrote: > > > >> >-Original Message- > >> >From: Jonathan Cameron > >> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that > >might > >> >be ANFE in aer_err_info > >> > > >> >On Wed, 17 Apr 2024 14:14:05 +0800 > >> >Zhenzhong Duan wrote: > >> > > >> >> In some cases the detector of a Non-Fatal Error(NFE) is not the most > >> >> appropriate agent to determine the type of the error. For example, > >> >> when software performs a configuration read from a non-existent > >> >> device or Function, completer will send an ERR_NONFATAL Message. > >> >> On some platforms, ERR_NONFATAL results in a System Error, which > >> >> breaks normal software probing. > >> >> > >> >> Advisory Non-Fatal Error(ANFE) is a special case that can be used > >> >> in above scenario. It is predominantly determined by the role of the > >> >> detecting agent (Requester, Completer, or Receiver) and the specific > >> >> error. In such cases, an agent with AER signals the NFE (if enabled) > >> >> by sending an ERR_COR Message as an advisory to software, instead of > >> >> sending ERR_NONFATAL. > >> >> > >> >> When processing an ANFE, ideally both correctable error(CE) status and > >> >> uncorrectable error(UE) status should be cleared. However, there is no > >> >> way to fully identify the UE associated with ANFE. Even worse, a Fatal > >> >> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as > >> >> ANFE. Treating an ANFE as NFE will reproduce above mentioned issue, > >> >> i.e., breaking softwore probing; treating NFE as ANFE will make us > >> >> ignoring some UEs which need active recover operation. To avoid > >clearing > >> >> UEs that are not ANFE by accident, the most conservative route is taken > >> >> here: If any of the FE/NFE Detected bits is set in Device Status, do not > >> >> touch UE status, they should be cleared later by the UE handler. > >Otherwise, > >> >> a specific set of UEs that may be raised as ANFE according to the PCIe > >> >> specification will be cleared if their corresponding severity is > >> >> Non-Fatal. > >> >> > >> >> To achieve above purpose, store UNCOR_STATUS bits that might be > >ANFE > >> >> in aer_err_info.anfe_status. So that those bits could be printed and > >> >> processed later. > >> >> > >> >> Tested-by: Yudong Wang > >> >> Co-developed-by: "Wang, Qingshun" > >> >> Signed-off-by: "Wang, Qingshun" > >> >> Signed-off-by: Zhenzhong Duan > >> >> --- > >> >> drivers/pci/pci.h | 1 + > >> >> drivers/pci/pcie/aer.c | 45 > >> >++ > >> >> 2 files changed, 46 insertions(+) > >> >> > >> >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > >> >> index 17fed1846847..3f9eb807f9fd 100644 > >> >> --- a/drivers/pci/pci.h > >> >> +++ b/drivers/pci/pci.h > >> >> @@ -412,6 +412,7 @@ struct aer_err_info { > >> >> > >> >> unsigned int status;/* COR/UNCOR Error Status */ > >> >> unsigned int mask; /* COR/UNCOR Error Mask */ > >> >> + unsigned int anfe_status; /* UNCOR Error Status for ANFE > >> >> */ > >> >> struct pcie_tlp_log tlp;/* TLP Header */ > >> >> }; > >> >> > >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > >> >> index ac6293c24976..27364ab4b148 100644 > >> >> --- a/drivers/pci/pcie/aer.c > >> >> +++ b/drivers/pci/pcie/aer.c > >> >> @@ -107,6 +107,12 @@ struct aer_stats { > >> >> PCI_ERR_ROOT_MULTI_COR_RCV | > >> > \ > >> >> PCI_ERR_ROOT_MULTI_UNCOR_RCV) > >> >> > >> >> +#define AER_ERR_ANFE_UNC_MASK > >> > (PCI_ERR_UNC_POISON_TLP | \ > >> >> + PCI_ERR_UNC_COMP_TIME | > >> > \ > >> >> + PCI_ERR_UNC_COMP_ABORT | > >> > \ > >> >> + PCI_ERR_UNC_UNX_COMP | > >> > \ > >> >> + PCI_ERR_UNC_UNSUP) > >> >> + > >> >> static int pcie_aer_disable; > >> >> static pci_ers_result_t aer_root_reset(struct pci_dev *dev); > >> >> > >> >> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain, > >unsigned > >> >int bus, unsigned int devfn, > >> >> EXPORT_SYMBOL_GPL(aer_recover_queue); > >> >> #endif > >> >> > >> >> +static void anfe_get_uc_status(struct pci_dev *dev, struct > >aer_err_info > >> >*info) > >> >> +{ > >> >> + u32 uncor_mask, uncor_status; > >> >> + u16 device_status; > >> >> + int aer = dev->aer_cap; > >> >> + > >> >> + if (pcie_capability_read_word(dev, PCI_EXP_DEVSTA, > >> >_status)) > >> >> +
Re: [PATCH 1/1] ALSA: aoa: soundbus: i2sbus: pcm: use 'time_left' variable with wait_for_completion_timeout()
On Tue, 30 Apr 2024 14:10:27 +0200, Wolfram Sang wrote: > > There is a confusing pattern in the kernel to use a variable named 'timeout' > to > store the result of wait_for_completion_timeout() causing patterns like: > > timeout = wait_for_completion_timeout(...) > if (!timeout) return -ETIMEDOUT; > > with all kinds of permutations. Use 'time_left' as a variable to make the code > self explaining. > > Fix to the proper variable type 'unsigned long' while here. > > Signed-off-by: Wolfram Sang Thanks, applied now to for-next branch. Takashi
Re: [PATCH 2/2] radix/kfence: support late __kfence_pool allocation
Hari Bathini writes: > With commit b33f778bba5ef ("kfence: alloc kfence_pool after system > startup"), KFENCE pool can be allocated after system startup via the > page allocator. This can lead to problems as all memory is not mapped > at page granularity anymore with CONFIG_KFENCE. Address this by direct > mapping all memory at PMD level and split the mapping for PMD pages > that overlap with __kfence_pool to page level granularity if and when > __kfence_pool is allocated after system startup. > > Signed-off-by: Hari Bathini > --- > arch/powerpc/include/asm/book3s/64/radix.h | 2 + > arch/powerpc/include/asm/kfence.h | 14 +- > arch/powerpc/mm/book3s64/radix_pgtable.c | 50 +- > 3 files changed, 64 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 8f55ff74bb68..0423ddbcf73c 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -340,6 +340,8 @@ extern void radix__vmemmap_remove_mapping(unsigned long > start, > extern int radix__map_kernel_page(unsigned long ea, unsigned long pa, >pgprot_t flags, unsigned int psz); > > +extern bool radix_kfence_init_pool(void); > + > static inline unsigned long radix__get_tree_size(void) > { > unsigned long rts_field; > diff --git a/arch/powerpc/include/asm/kfence.h > b/arch/powerpc/include/asm/kfence.h > index 18ec2b06ba1e..c5d2fb2f9ecb 100644 > --- a/arch/powerpc/include/asm/kfence.h > +++ b/arch/powerpc/include/asm/kfence.h > @@ -18,12 +18,24 @@ > > #ifdef CONFIG_KFENCE > extern bool kfence_early_init; > -#endif > + > +static inline bool kfence_alloc_pool_late(void) > +{ > + return !kfence_early_init; > +} Minor nit, but do we need kfence_alloc_pool_late()? The function name looks confusing. Can we not just use !kfence_early_init? If not then maybe bool kfence_late_init? > > static inline bool arch_kfence_init_pool(void) > { > +#ifdef CONFIG_PPC_BOOK3S_64 > + if (radix_enabled()) > + return radix_kfence_init_pool(); Can we directly check... if (radix_enabled() && !kfence_early_init) ... instead of embedding the check inside radix_kfence_late_init_pool() > +#endif > + > return true; > } > +#else > +static inline bool kfence_alloc_pool_late(void) { return false; } > +#endif > > #ifdef CONFIG_PPC64 > static inline bool kfence_protect_page(unsigned long addr, bool protect) > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index fccbf92f279b..f4374e3e31e1 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -253,6 +253,53 @@ void radix__mark_initmem_nx(void) > } > #endif /* CONFIG_STRICT_KERNEL_RWX */ > > +#ifdef CONFIG_KFENCE > +static inline int radix_split_pmd_page(pmd_t *pmd, unsigned long addr) > +{ > + pte_t *pte = pte_alloc_one_kernel(_mm); > + unsigned long pfn = PFN_DOWN(__pa(addr)); Minor nit. Since addr will always be page aligned, so maybe PHYS_PFN() is better suited. Although it does not matter. > + int i; > + > + if (!pte) > + return -ENOMEM; > + > + for (i = 0; i < PTRS_PER_PTE; i++) { > + __set_pte_at(_mm, addr, pte + i, pfn_pte(pfn + i, > PAGE_KERNEL), 0); > + asm volatile("ptesync": : :"memory"); > + } Maybe a comment above the loop on why __set_pte_at() is ok for late kfence init? and why not pte_update()? [1] [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/ > + pmd_populate_kernel(_mm, pmd, pte); > + > + flush_tlb_kernel_range(addr, addr + PMD_SIZE); > + return 0; > +} > + > +bool radix_kfence_init_pool(void) > +{ > + unsigned int page_psize, pmd_psize; > + unsigned long addr; > + pmd_t *pmd; > + > + if (!kfence_alloc_pool_late()) > + return true; > + > + page_psize = shift_to_mmu_psize(PAGE_SHIFT); > + pmd_psize = shift_to_mmu_psize(PMD_SHIFT); > + for (addr = (unsigned long)__kfence_pool; is_kfence_address((void > *)addr); > + addr += PAGE_SIZE) { > + pmd = pmd_off_k(addr); > + > + if (pmd_leaf(*pmd)) { > + if (radix_split_pmd_page(pmd, addr & PMD_MASK)) > + return false; > + update_page_count(pmd_psize, -1); > + update_page_count(page_psize, PTRS_PER_PTE); > + } > + } > + > + return true; > +} > +#endif > + > static inline void __meminit > print_mapping(unsigned long start, unsigned long end, unsigned long size, > bool exec) > { > @@ -391,7 +438,8 @@ static void __init radix_init_pgtable(void) > continue; > } > > - WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, > ~0UL)); > +