Re: [PATCH v3 00/11] sysctl: treewide: constify ctl_table argument of sysctl handlers

2024-04-27 Thread Thomas Weißschuh
On 2024-04-25 09:10:27+, Thomas Weißschuh wrote:
> On 2024-04-24 20:12:34+, Jakub Kicinski wrote:
> > On Tue, 23 Apr 2024 09:54:35 +0200 Thomas Weißschuh wrote:
> > > The series was split from my larger series sysctl-const series [0].
> > > It only focusses on the proc_handlers but is an important step to be
> > > able to move all static definitions of ctl_table into .rodata.
> > 
> > Split this per subsystem, please.
> 
> Unfortunately this would introduce an enormous amount of code churn.
> 
> The function prototypes for each callback have to stay consistent.
> So a another callback member ("proc_handler_new") is needed and users
> would be migrated to it gradually.
> 
> But then *all* definitions of "struct ctl_table" throughout the tree need to
> be touched.
> In contrast, the proposed series only needs to change the handler
> implementations, not their usage sites.
> 
> There are many, many more usage sites than handler implementations.
> 
> Especially, as the majority of sysctl tables use the standard handlers
> (proc_dostring, proc_dobool, ...) and are not affected by the proposed
> aproach at all.
> 
> And then we would have introduced a new handler name "proc_handler_new"
> and maybe have to do the whole thing again to rename it back to
> the original and well-known "proc_handler".

This aproach could be optimized by only migrating the usages of the
custom handler implementations to "proc_handler_new".
After this we could move over the core handlers and "proc_handler" in
one small patch that does not need to touch the usages sites.

Afterwards all non-core usages would be migrated back from
"proc_handler_new" to "proc_handler" and the _new variant could be
dropped again.

It would still be more than twice the churn of my current patch.
And these patches would be more complex than the current
"just add a bunch of consts, nothing else".

Personally I still prefer the original aproach.


Thomas


[PATCH] kprobe/ftrace: bail out if ftrace was killed

2024-04-27 Thread Stephen Brennan
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.

This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:

[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer

  sudo perf probe --add commit_creds
  sudo perf trace -e probe:commit_creds
  # In another terminal
  make
  sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
  # Back to perf terminal
  # ctrl-c
  sudo perf probe --del commit_creds

After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.

Signed-off-by: Stephen Brennan 
---

Apologies for the wide net cast here. I recognize that a change like this
may need to be split up and go through arch-specific trees. I hoped to get
feedback on the patch itself. If it's satisfactory and the architecture
maintainers prefer it split out, I'm glad to do it. Thanks!

 arch/csky/kernel/probes/ftrace.c | 3 +++
 arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
 arch/parisc/kernel/ftrace.c  | 3 +++
 arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
 arch/riscv/kernel/probes/ftrace.c| 3 +++
 arch/s390/kernel/ftrace.c| 3 +++
 arch/x86/kernel/kprobes/ftrace.c | 3 +++
 include/linux/ftrace.h   | 2 ++
 8 files changed, 23 insertions(+)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 834cffcfbce3..3931bf9f707b 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
struct pt_regs *regs;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/loongarch/kernel/ftrace_dyn.c 
b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..82c952cb5be0 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
struct kprobe_ctlblk *kcb;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 621a4b386ae4..3660834f54c3 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 072ebe7f290b..85eb55aa1457 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
struct pt_regs *regs;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(nip, parent_nip);
if (bit < 0)
return;
diff --git a/arch/riscv/kernel/probes/ftrace.c 
b/arch/riscv/kernel/probes/ftrace.c
index 7142ec42e889..8814fbe4c888 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe_ctlblk *kcb;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+   return;
+
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..ccbe8ccf945b 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
struct kprobe *p;
int bit;
 
+   if (unlikely(ftrace_is_dead()))
+  

RE: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might be ANFE in aer_err_info

2024-04-27 Thread Duan, Zhenzhong
Hi Jonathan,

>-Original Message-
>From: Jonathan Cameron 
>Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that might
>be ANFE in aer_err_info
>
>On Tue, 23 Apr 2024 02:25:05 +
>"Duan, Zhenzhong"  wrote:
>
>> >-Original Message-
>> >From: Jonathan Cameron 
>> >Subject: Re: [PATCH v3 1/3] PCI/AER: Store UNCOR_STATUS bits that
>might
>> >be ANFE in aer_err_info
>> >
>> >On Wed, 17 Apr 2024 14:14:05 +0800
>> >Zhenzhong Duan  wrote:
>> >
>> >> In some cases the detector of a Non-Fatal Error(NFE) is not the most
>> >> appropriate agent to determine the type of the error. For example,
>> >> when software performs a configuration read from a non-existent
>> >> device or Function, completer will send an ERR_NONFATAL Message.
>> >> On some platforms, ERR_NONFATAL results in a System Error, which
>> >> breaks normal software probing.
>> >>
>> >> Advisory Non-Fatal Error(ANFE) is a special case that can be used
>> >> in above scenario. It is predominantly determined by the role of the
>> >> detecting agent (Requester, Completer, or Receiver) and the specific
>> >> error. In such cases, an agent with AER signals the NFE (if enabled)
>> >> by sending an ERR_COR Message as an advisory to software, instead of
>> >> sending ERR_NONFATAL.
>> >>
>> >> When processing an ANFE, ideally both correctable error(CE) status and
>> >> uncorrectable error(UE) status should be cleared. However, there is no
>> >> way to fully identify the UE associated with ANFE. Even worse, a Fatal
>> >> Error(FE) or Non-Fatal Error(NFE) may set the same UE status bit as
>> >> ANFE. Treating an ANFE as NFE will reproduce above mentioned issue,
>> >> i.e., breaking softwore probing; treating NFE as ANFE will make us
>> >> ignoring some UEs which need active recover operation. To avoid
>clearing
>> >> UEs that are not ANFE by accident, the most conservative route is taken
>> >> here: If any of the FE/NFE Detected bits is set in Device Status, do not
>> >> touch UE status, they should be cleared later by the UE handler.
>Otherwise,
>> >> a specific set of UEs that may be raised as ANFE according to the PCIe
>> >> specification will be cleared if their corresponding severity is 
>> >> Non-Fatal.
>> >>
>> >> To achieve above purpose, store UNCOR_STATUS bits that might be
>ANFE
>> >> in aer_err_info.anfe_status. So that those bits could be printed and
>> >> processed later.
>> >>
>> >> Tested-by: Yudong Wang 
>> >> Co-developed-by: "Wang, Qingshun" 
>> >> Signed-off-by: "Wang, Qingshun" 
>> >> Signed-off-by: Zhenzhong Duan 
>> >> ---
>> >>  drivers/pci/pci.h  |  1 +
>> >>  drivers/pci/pcie/aer.c | 45
>> >++
>> >>  2 files changed, 46 insertions(+)
>> >>
>> >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> >> index 17fed1846847..3f9eb807f9fd 100644
>> >> --- a/drivers/pci/pci.h
>> >> +++ b/drivers/pci/pci.h
>> >> @@ -412,6 +412,7 @@ struct aer_err_info {
>> >>
>> >>   unsigned int status;/* COR/UNCOR Error Status */
>> >>   unsigned int mask;  /* COR/UNCOR Error Mask */
>> >> + unsigned int anfe_status;   /* UNCOR Error Status for ANFE */
>> >>   struct pcie_tlp_log tlp;/* TLP Header */
>> >>  };
>> >>
>> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> >> index ac6293c24976..27364ab4b148 100644
>> >> --- a/drivers/pci/pcie/aer.c
>> >> +++ b/drivers/pci/pcie/aer.c
>> >> @@ -107,6 +107,12 @@ struct aer_stats {
>> >>   PCI_ERR_ROOT_MULTI_COR_RCV |
>> >\
>> >>   PCI_ERR_ROOT_MULTI_UNCOR_RCV)
>> >>
>> >> +#define AER_ERR_ANFE_UNC_MASK
>> >(PCI_ERR_UNC_POISON_TLP |   \
>> >> + PCI_ERR_UNC_COMP_TIME |
>> >\
>> >> + PCI_ERR_UNC_COMP_ABORT |
>> >\
>> >> + PCI_ERR_UNC_UNX_COMP |
>> >\
>> >> + PCI_ERR_UNC_UNSUP)
>> >> +
>> >>  static int pcie_aer_disable;
>> >>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev);
>> >>
>> >> @@ -1196,6 +1202,41 @@ void aer_recover_queue(int domain,
>unsigned
>> >int bus, unsigned int devfn,
>> >>  EXPORT_SYMBOL_GPL(aer_recover_queue);
>> >>  #endif
>> >>
>> >> +static void anfe_get_uc_status(struct pci_dev *dev, struct
>aer_err_info
>> >*info)
>> >> +{
>> >> + u32 uncor_mask, uncor_status;
>> >> + u16 device_status;
>> >> + int aer = dev->aer_cap;
>> >> +
>> >> + if (pcie_capability_read_word(dev, PCI_EXP_DEVSTA,
>> >&device_status))
>> >> + return;
>> >> + /*
>> >> +  * Take the most conservative route here. If there are
>> >> +  * Non-Fatal/Fatal errors detected, do not assume any
>> >> +  * bit in uncor_status is set by ANFE.
>> >> +  */
>> >> + if (device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED))
>> >> + return;
>> >> +
>> >
>> >Is there not a race here?  If we happen to get either an NFED or FED
>> >between the read of device_status above and he

Re: [PATCH v6 08/16] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-27 Thread Mike Rapoport
On Fri, Apr 26, 2024 at 12:01:34PM -0700, Song Liu wrote:
> On Fri, Apr 26, 2024 at 1:30 AM Mike Rapoport  wrote:
> >
> > From: "Mike Rapoport (IBM)" 
> >
> > Extend execmem parameters to accommodate more complex overrides of
> > module_alloc() by architectures.
> >
> > This includes specification of a fallback range required by arm, arm64
> > and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
> > allocation of KASAN shadow required by s390 and x86 and support for
> > late initialization of execmem required by arm64.
> >
> > The core implementation of execmem_alloc() takes care of suppressing
> > warnings when the initial allocation fails but there is a fallback range
> > defined.
> >
> > Signed-off-by: Mike Rapoport (IBM) 
> > Acked-by: Will Deacon 
> 
> nit: We should probably move the logic for ARCH_WANTS_EXECMEM_LATE
> to a separate patch.

This would require to split arm64 and I prefer to keep all these changes
together. 

> Otherwise,
> 
> Acked-by: Song Liu 
 
Thanks!

-- 
Sincerely yours,
Mike.