date:20230619

Re: [RFC PATCH v1 1/3] Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto"

2023-06-19 Thread Naveen N Rao


Christophe Leroy wrote:

This reverts commit 1e688dd2a3d6759d416616ff07afc4bb836c4213.

That commit aimed at optimising the code around generation of
WARN_ON/BUG_ON but this leads to a lot of dead code erroneously
generated by GCC.

 text  data bss dec hex filename
  9551585   3627834  224376 13403795 cc8693 vmlinux.before
  9535281   3628358  224376 13388015 cc48ef vmlinux.after

Once this change is reverted, in a standard configuration (pmac32 +
function tracer) the text is reduced by 16k which is around 1.7%


Aneesh recently reported a build failure due to the use of 'asm goto' in  
WARN_ON(). We were able to root-cause it to the use of 'asm goto' with 
two config options: CONFIG_CC_OPTIMIZE_FOR_SIZE and 
CONFIG_DEBUG_SECTION_MISMATCH.


Along with the issues we found with 'asm goto' during objtool 
enablement, I think it might be better to disable it for now.

Acked-by: Naveen N Rao 


- Naveen

[powerpc:next] BUILD SUCCESS b684c09f09e7a6af3794d4233ef785819e72db79

2023-06-19 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: b684c09f09e7a6af3794d4233ef785819e72db79  powerpc: update 
ppc_save_regs to save current r1 in pt_regs

elapsed time: 732m

configs tested: 143
configs skipped: 11

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
alpharandconfig-r001-20230619   gcc  
alpharandconfig-r004-20230619   gcc  
alpharandconfig-r014-20230619   gcc  
alpharandconfig-r035-20230619   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc  randconfig-r001-20230619   gcc  
arc  randconfig-r023-20230619   gcc  
arc  randconfig-r043-20230619   gcc  
arcvdk_hs38_smp_defconfig   gcc  
arm  allmodconfig   gcc  
arm  allyesconfig   gcc  
arm defconfig   gcc  
arm  gemini_defconfig   gcc  
arm  randconfig-r002-20230619   clang
arm  randconfig-r015-20230619   gcc  
arm  randconfig-r046-20230619   gcc  
arm64allyesconfig   gcc  
arm64   defconfig   gcc  
arm64randconfig-r023-20230619   clang
arm64randconfig-r031-20230619   gcc  
cskydefconfig   gcc  
csky randconfig-r016-20230619   gcc  
csky randconfig-r025-20230619   gcc  
hexagon  randconfig-r041-20230619   clang
hexagon  randconfig-r045-20230619   clang
i386 allyesconfig   gcc  
i386 buildonly-randconfig-r004-20230619   gcc  
i386 buildonly-randconfig-r005-20230619   gcc  
i386 buildonly-randconfig-r006-20230619   gcc  
i386  debian-10.3   gcc  
i386defconfig   gcc  
i386 randconfig-i001-20230619   gcc  
i386 randconfig-i002-20230619   gcc  
i386 randconfig-i003-20230619   gcc  
i386 randconfig-i004-20230619   gcc  
i386 randconfig-i005-20230619   gcc  
i386 randconfig-i006-20230619   gcc  
i386 randconfig-i011-20230619   clang
i386 randconfig-i012-20230619   clang
i386 randconfig-i013-20230619   clang
i386 randconfig-i014-20230619   clang
i386 randconfig-i015-20230619   clang
i386 randconfig-i016-20230619   clang
i386 randconfig-r002-20230619   gcc  
i386 randconfig-r006-20230619   gcc  
i386 randconfig-r012-20230619   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarch   defconfig   gcc  
loongarchrandconfig-r006-20230619   gcc  
loongarchrandconfig-r012-20230619   gcc  
loongarchrandconfig-r015-20230619   gcc  
loongarchrandconfig-r022-20230619   gcc  
loongarchrandconfig-r036-20230619   gcc  
m68k allmodconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k randconfig-r006-20230619   gcc  
m68k randconfig-r013-20230619   gcc  
m68k randconfig-r026-20230619   gcc  
m68k randconfig-r034-20230619   gcc  
microblaze   randconfig-r003-20230619   gcc  
microblaze   randconfig-r031-20230619   gcc  
mips allmodconfig   gcc  
mips allyesconfig   gcc  
mips randconfig-r015-20230619   gcc  
mips randconfig-r016-20230619   gcc  
mips randconfig-r022-20230619   gcc  
mips randconfig-r026-20230619   gcc  
nios2   defconfig   gcc  
nios2randconfig-r003-20230619   gcc  
nios2randconfig-r004-20230619   gcc  
nios2randconfig-r011-20230619   gcc  
nios2randconfig-r015-20230619   gcc  
openrisc randconfig-r003-20230619   gcc  
openrisc randconfig-r013-20230619   gcc  
parisc   allyesconfig   gcc  
parisc  defconfig   gcc  
parisc   randconfig-r001-20230619   gcc  
parisc   randconfig-r002-20230619   gcc  
parisc   randconfig-r004-20230619   gcc  
parisc   randconfig-r021-20230619   gcc  
parisc

[PATCH 2/2] powerpc: drop MPC85xx_CDS platform support

2023-06-19 Thread Paul Gortmaker

The MPC8541/8548/8555 Configurable Development System (CDS) were the
vehicle used to provide evaluation of the 1st e500-v2 CPUs around 2007.

Similar to the earlier MPC83xx-MDS systems we removed, the "brains"
exist on a PCI-X card, but additional connectors exist to the right of
the PCI-X slot, two structural metal pins are used to provide stability
in a vertical ATX mounting, and the CPU is now on a daughter-card vs. a
clamped down BGA.

Given the extra complexity and risk of connector damage, the 8548CDS
I had access to came pre-assembled in a basic white Antec case common
for that era, and I'm inclined to assume that was the default.

Power was typical "Pentium4" 2005 ATX - the main 20 pin connector went
to the PCI ATX form factor backplane, and the 4 pin black/yellow went
to the CPU card.

Like previous evaluation boards, they attempted to provide break-out
connectors for as many features as possible, and that made for a fairly
complex looking system.

In any case, these are over 15 years old, and fairly complex systems,
originally made for a small group of industry related people, and made
for use where quiet fan operation wasn't important.  Given that, it
makes sense to remove support from them in 2023.

Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Paul Gortmaker 
---
 arch/powerpc/boot/Makefile|   3 -
 arch/powerpc/boot/dts/fsl/mpc8541cds.dts  | 375 -
 arch/powerpc/boot/dts/fsl/mpc8548cds.dtsi | 302 --
 arch/powerpc/boot/dts/fsl/mpc8548cds_32b.dts  |  82 
 arch/powerpc/boot/dts/fsl/mpc8548cds_36b.dts  |  82 
 arch/powerpc/boot/dts/fsl/mpc8555cds.dts  | 375 -
 .../configs/85xx/mpc85xx_cds_defconfig|  52 ---
 arch/powerpc/configs/mpc85xx_base.config  |   1 -
 arch/powerpc/platforms/85xx/Makefile  |   1 -
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 387 --
 10 files changed, 1660 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8541cds.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds.dtsi
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds_32b.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds_36b.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8555cds.dts
 delete mode 100644 arch/powerpc/configs/85xx/mpc85xx_cds_defconfig
 delete mode 100644 arch/powerpc/platforms/85xx/mpc85xx_cds.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index cf728cb3e9a9..968aee2025b8 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -342,9 +342,6 @@ image-$(CONFIG_MPC834x_ITX) += cuImage.mpc8349emitx 
\
 image-$(CONFIG_ASP834x)+= dtbImage.asp834x-redboot
 
 # Board ports in arch/powerpc/platform/85xx/Kconfig
-image-$(CONFIG_MPC85xx_CDS)+= cuImage.mpc8541cds \
-  cuImage.mpc8548cds_32b \
-  cuImage.mpc8555cds
 image-$(CONFIG_MPC85xx_MDS)+= cuImage.mpc8568mds
 image-$(CONFIG_MPC85xx_DS) += cuImage.mpc8544ds \
   cuImage.mpc8572ds
diff --git a/arch/powerpc/boot/dts/fsl/mpc8541cds.dts 
b/arch/powerpc/boot/dts/fsl/mpc8541cds.dts
deleted file mode 100644
index a2a6c5cf852e..
--- a/arch/powerpc/boot/dts/fsl/mpc8541cds.dts
+++ /dev/null
@@ -1,375 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * MPC8541 CDS Device Tree Source
- *
- * Copyright 2006, 2008 Freescale Semiconductor Inc.
- */
-
-/dts-v1/;
-
-/include/ "e500v1_power_isa.dtsi"
-
-/ {
-   model = "MPC8541CDS";
-   compatible = "MPC8541CDS", "MPC85xxCDS";
-   #address-cells = <1>;
-   #size-cells = <1>;
-
-   aliases {
-   ethernet0 = 
-   ethernet1 = 
-   serial0 = 
-   serial1 = 
-   pci0 = 
-   pci1 = 
-   };
-
-   cpus {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   PowerPC,8541@0 {
-   device_type = "cpu";
-   reg = <0x0>;
-   d-cache-line-size = <32>;   // 32 bytes
-   i-cache-line-size = <32>;   // 32 bytes
-   d-cache-size = <0x8000>;// L1, 32K
-   i-cache-size = <0x8000>;// L1, 32K
-   timebase-frequency = <0>;   //  33 MHz, from uboot
-   bus-frequency = <0>;// 166 MHz
-   clock-frequency = <0>;  // 825 MHz, from uboot
-   next-level-cache = <>;
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x0 0x800>;  // 128M at 0x0
-   };
-
-   soc8541@e000 {
-   #address-cells = <1>;
-   #size-cells = <1>;
-

[PATCH 1/2] powerpc: drop MPC8540_ADS and MPC8560_ADS platform support

2023-06-19 Thread Paul Gortmaker

Based on the revision history in the manual(s), these e500-v1
platforms were first available around 2002.

Like a lot of evaluation boards, they attempted to provide break-out
connectors for all possible features, and that combined with four
PCI-X slots (and the age/era) meant for a considerably large board.

As I recall it, from a Linux point of view, the biggest difference
between 8540 and 8560 was in the UART implementation, and that is
reflected in a diff of the defconfigs.

In any case, these are over 20 years old, and by today's standards
only have a small amount of DDR1 memory, and were not widely available.

Given that, it makes sense to remove support from them in 2023.

Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Paul Gortmaker 
---
 arch/powerpc/boot/Makefile|   2 -
 arch/powerpc/boot/dts/fsl/mpc8540ads.dts  | 355 
 arch/powerpc/boot/dts/fsl/mpc8560ads.dts  | 388 --
 .../configs/85xx/mpc8540_ads_defconfig|  47 ---
 .../configs/85xx/mpc8560_ads_defconfig|  50 ---
 arch/powerpc/configs/mpc85xx_base.config  |   2 -
 arch/powerpc/platforms/85xx/Makefile  |   2 -
 arch/powerpc/platforms/85xx/mpc85xx_ads.c | 162 
 8 files changed, 1008 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8540ads.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8560ads.dts
 delete mode 100644 arch/powerpc/configs/85xx/mpc8540_ads_defconfig
 delete mode 100644 arch/powerpc/configs/85xx/mpc8560_ads_defconfig
 delete mode 100644 arch/powerpc/platforms/85xx/mpc85xx_ads.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index bf8976563e02..cf728cb3e9a9 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -342,8 +342,6 @@ image-$(CONFIG_MPC834x_ITX) += cuImage.mpc8349emitx 
\
 image-$(CONFIG_ASP834x)+= dtbImage.asp834x-redboot
 
 # Board ports in arch/powerpc/platform/85xx/Kconfig
-image-$(CONFIG_MPC8540_ADS)+= cuImage.mpc8540ads
-image-$(CONFIG_MPC8560_ADS)+= cuImage.mpc8560ads
 image-$(CONFIG_MPC85xx_CDS)+= cuImage.mpc8541cds \
   cuImage.mpc8548cds_32b \
   cuImage.mpc8555cds
diff --git a/arch/powerpc/boot/dts/fsl/mpc8540ads.dts 
b/arch/powerpc/boot/dts/fsl/mpc8540ads.dts
deleted file mode 100644
index e03ae130162b..
--- a/arch/powerpc/boot/dts/fsl/mpc8540ads.dts
+++ /dev/null
@@ -1,355 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * MPC8540 ADS Device Tree Source
- *
- * Copyright 2006, 2008 Freescale Semiconductor Inc.
- */
-
-/dts-v1/;
-
-/include/ "e500v1_power_isa.dtsi"
-
-/ {
-   model = "MPC8540ADS";
-   compatible = "MPC8540ADS", "MPC85xxADS";
-   #address-cells = <1>;
-   #size-cells = <1>;
-
-   aliases {
-   ethernet0 = 
-   ethernet1 = 
-   ethernet2 = 
-   serial0 = 
-   serial1 = 
-   pci0 = 
-   };
-
-   cpus {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   PowerPC,8540@0 {
-   device_type = "cpu";
-   reg = <0x0>;
-   d-cache-line-size = <32>;   // 32 bytes
-   i-cache-line-size = <32>;   // 32 bytes
-   d-cache-size = <0x8000>;// L1, 32K
-   i-cache-size = <0x8000>;// L1, 32K
-   timebase-frequency = <0>;   //  33 MHz, from uboot
-   bus-frequency = <0>;// 166 MHz
-   clock-frequency = <0>;  // 825 MHz, from uboot
-   next-level-cache = <>;
-   };
-   };
-
-   memory {
-   device_type = "memory";
-   reg = <0x0 0x800>;  // 128M at 0x0
-   };
-
-   soc8540@e000 {
-   #address-cells = <1>;
-   #size-cells = <1>;
-   device_type = "soc";
-   compatible = "simple-bus";
-   ranges = <0x0 0xe000 0x10>;
-   bus-frequency = <0>;
-
-   ecm-law@0 {
-   compatible = "fsl,ecm-law";
-   reg = <0x0 0x1000>;
-   fsl,num-laws = <8>;
-   };
-
-   ecm@1000 {
-   compatible = "fsl,mpc8540-ecm", "fsl,ecm";
-   reg = <0x1000 0x1000>;
-   interrupts = <17 2>;
-   interrupt-parent = <>;
-   };
-
-   memory-controller@2000 {
-   compatible = "fsl,mpc8540-memory-controller";
-   reg = <0x2000 0x1000>;
-   interrupt-parent = <>;
-   interrupts = <18 2>;

[PATCH v2 0/2] Remove some e500/MPC85xx evaluation platforms

2023-06-19 Thread Paul Gortmaker

v1: 
https://lore.kernel.org/all/20230221194637.28436-1-paul.gortma...@windriver.com/

v1 --> v2:
   -don't remove MPC8568MDS or P1021 or P1012 platforms as per discussion
   -drop commit #4 that removed kernel fragments still in use elsewhere.
   -trivial refresh for the updated baseline of linux-next


In a similar theme to the e300/MPC83xx evaluation platform removal[1],
this targets removal of two of the oldest e500/MPC85xx evaluation
boards that were produced in limited numbers and primarily made available
to hardware/software developers to shape their own boards and BSPs.

We start with the MPC8540-ADS[2] and MPC8560-ADS[3] -- based on the revision
history in the user's guide[4], these near-identical platforms date back to
at least 2002.  These boards are probably a part of the very small few that
still exist from the ppc ---> powerpc transition.  Typical of evaluation
boards, and as the picture[3] shows, these boards had a large footprint in
order to break out connectors to evaluate as many features as possible.

For reference, I will note that for comparison that I retired our SBC8560
support over a decade ago, in v3.6 (2012, in commit b048b4e17cbb) and I
don't think a single person complained.

Next, position yourself about 2007, and the MPC8548CDS (and variants)
appeared as a vehicle to showcase the then new e500-v2 processor family,
in a PCI-X card form factor with an additional backplane and the CPU on
yet another daughter-card.  Not very "hobbyist" friendly.  As the saying
goes, a picture[5] is worth 1000 words.  It was quite the 3D beast.

Again, for comparison, and perhaps well overdue, I'd requested removal of
our SBC8548 support in Jan 2021 (c12adb067844, v5.15).

Testing included builds of defconfig, mpc85xx_defconfig, mpc85xx_smp_defconfig
and corenet32_smp_defconfig

As there is obviously no rush for this to be in v6.5, a defer to v6.6 would
be perfectly fine.  In any case it is is based off linux-next from today.

Paul.
--

[1] 
https://lore.kernel.org/all/20230220115913.25811-1-paul.gortma...@windriver.com/
[2] 
https://www.nxp.com/products/no-longer-manufactured/application-development-system-for-mpc8540:MPC8540ADS
[3] 
https://www.nxp.com/products/no-longer-manufactured/application-development-system-for-mpc8560:MPC8560ADS
[4] https://www.nxp.com/docs/en/reference-manual/MPC8560ADSUG.pdf
[5] https://www.flickr.com/photos/daiharuki/905150424/in/photostream/

Cc: Scott Wood 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Li Yang 
Cc: Claudiu Manoil 
Cc: Pali Rohár 

---

Paul Gortmaker (2):
  powerpc: drop MPC8540_ADS and MPC8560_ADS platform support
  powerpc: drop MPC85xx_CDS platform support

 arch/powerpc/boot/Makefile|   5 -
 arch/powerpc/boot/dts/fsl/mpc8540ads.dts  | 355 
 arch/powerpc/boot/dts/fsl/mpc8541cds.dts  | 375 -
 arch/powerpc/boot/dts/fsl/mpc8548cds.dtsi | 302 --
 arch/powerpc/boot/dts/fsl/mpc8548cds_32b.dts  |  82 
 arch/powerpc/boot/dts/fsl/mpc8548cds_36b.dts  |  82 
 arch/powerpc/boot/dts/fsl/mpc8555cds.dts  | 375 -
 arch/powerpc/boot/dts/fsl/mpc8560ads.dts  | 388 --
 .../configs/85xx/mpc8540_ads_defconfig|  47 ---
 .../configs/85xx/mpc8560_ads_defconfig|  50 ---
 .../configs/85xx/mpc85xx_cds_defconfig|  52 ---
 arch/powerpc/configs/mpc85xx_base.config  |   3 -
 arch/powerpc/platforms/85xx/Makefile  |   3 -
 arch/powerpc/platforms/85xx/mpc85xx_ads.c | 162 
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 387 -
 15 files changed, 2668 deletions(-)
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8540ads.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8541cds.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds.dtsi
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds_32b.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8548cds_36b.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8555cds.dts
 delete mode 100644 arch/powerpc/boot/dts/fsl/mpc8560ads.dts
 delete mode 100644 arch/powerpc/configs/85xx/mpc8540_ads_defconfig
 delete mode 100644 arch/powerpc/configs/85xx/mpc8560_ads_defconfig
 delete mode 100644 arch/powerpc/configs/85xx/mpc85xx_cds_defconfig
 delete mode 100644 arch/powerpc/platforms/85xx/mpc85xx_ads.c
 delete mode 100644 arch/powerpc/platforms/85xx/mpc85xx_cds.c

-- 
2.25.1

Re: [PATCH mm-unstable v2 00/10] mm/kvm: locklessly clear the accessed bit

2023-06-19 Thread Yu Zhao

On Fri, Jun 9, 2023 at 3:08 AM Paolo Bonzini  wrote:
>
> On 5/27/23 01:44, Yu Zhao wrote:
> > TLDR
> > 
> > This patchset adds a fast path to clear the accessed bit without
> > taking kvm->mmu_lock. It can significantly improve the performance of
> > guests when the host is under heavy memory pressure.
> >
> > ChromeOS has been using a similar approach [1] since mid 2021 and it
> > was proven successful on tens of millions devices.
> >
> > This v2 addressed previous requests [2] on refactoring code, removing
> > inaccurate/redundant texts, etc.
> >
> > [1]https://crrev.com/c/2987928
> > [2]https://lore.kernel.org/r/20230217041230.2417228-1-yuz...@google.com/
>
>  From the KVM point of view the patches look good (though I wouldn't
> mind if Nicholas took a look at the ppc part).  Jason's comment on the
> MMU notifier side are promising as well.  Can you send v3 with Oliver's
> comments addressed?

Thanks. I'll address all the comments in v3 and post it asap.

Meanwhile, some updates on the recent progress from my side:
1. I've asked some downstream kernels to pick up v2 for testing, the
Archlinux Zen kernel did. I don't really expect its enthusiastic
testers to find this series relevant to their use cases. But who
knows.
2. I've also asked openbenchmarking.org to run their popular highmem
benchmark suites with v2. Hopefully they'll have some independent
results soon.
3. I've backported v2 to v5.15 and v6.1 and started an A/B experiment
involving ~1 million devices, as I mentioned in another email in this
thread. I should have some results to share when posting v3.

Re: [PATCH v2 13/13] sh/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread John Paul Adrian Glaubitz

Hi Eric!

On Mon, 2023-06-19 at 10:58 -0400, Eric DeVolder wrote:
> The kexec and crash kernel options are provided in the common
> kernel/Kconfig.kexec. Utilize the common options and provide
> the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
> equivalent set of KEXEC and CRASH options.
> 
> Signed-off-by: Eric DeVolder 
> ---
>  arch/sh/Kconfig | 46 --
>  1 file changed, 8 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index 9652d367fc37..d52e0beed7e9 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -546,44 +546,14 @@ menu "Kernel features"
>  
>  source "kernel/Kconfig.hz"
>  
> -config KEXEC
> - bool "kexec system call (EXPERIMENTAL)"
> - depends on MMU
> - select KEXEC_CORE
> - help
> -   kexec is a system call that implements the ability to shutdown your
> -   current kernel, and to start another kernel.  It is like a reboot
> -   but it is independent of the system firmware.  And like a reboot
> -   you can start any kernel with it, not just Linux.
> -
> -   The name comes from the similarity to the exec system call.
> -
> -   It is an ongoing process to be certain the hardware in a machine
> -   is properly shutdown, so do not be surprised if this code does not
> -   initially work for you.  As of this writing the exact hardware
> -   interface is strongly in flux, so no good recommendation can be
> -   made.
> -
> -config CRASH_DUMP
> - bool "kernel crash dumps (EXPERIMENTAL)"
> - depends on BROKEN_ON_SMP
> - help
> -   Generate crash dump after being started by kexec.
> -   This should be normally only set in special crash dump kernels
> -   which are loaded in the main kernel with kexec-tools into
> -   a specially reserved region and then later executed after
> -   a crash by kdump/kexec. The crash dump kernel must be compiled
> -   to a memory address not used by the main kernel using
> -   PHYSICAL_START.
> -
> -   For more details see Documentation/admin-guide/kdump/kdump.rst
> -
> -config KEXEC_JUMP
> - bool "kexec jump (EXPERIMENTAL)"
> - depends on KEXEC && HIBERNATION
> - help
> -   Jump between original kernel and kexeced kernel and invoke
> -   code via KEXEC
> +config ARCH_SUPPORTS_KEXEC
> + def_bool MMU
> +
> +config ARCH_SUPPORTS_CRASH_DUMP
> + def_bool BROKEN_ON_SMP
> +
> +config ARCH_SUPPORTS_KEXEC_JUMP
> + def_bool y
>  
>  config PHYSICAL_START
>   hex "Physical address where the kernel is loaded" if (EXPERT || 
> CRASH_DUMP)

Acked-by: John Paul Adrian Glaubitz 

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

[PATCH v2 11/13] riscv/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/riscv/Kconfig | 48 ++
 1 file changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 5966ad97c30c..c484abd9bbfd 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -585,48 +585,28 @@ config RISCV_BOOT_SPINWAIT
 
  If unsure what to do here, say N.
 
-config KEXEC
-   bool "Kexec system call"
-   depends on MMU
+config ARCH_SUPPORTS_KEXEC
+   def_bool MMU
+
+config ARCH_SELECTS_KEXEC
+   def_bool y
+   depends on KEXEC
select HOTPLUG_CPU if SMP
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel. It is like a reboot
- but it is independent of the system firmware. And like a reboot
- you can start any kernel with it, not just Linux.
 
- The name comes from the similarity to the exec system call.
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool 64BIT && MMU && CRYPTO=y && CRYPTO_SHA256=y
 
-config KEXEC_FILE
-   bool "kexec file based systmem call"
-   depends on 64BIT && MMU
-   select HAVE_IMA_KEXEC if IMA
-   select KEXEC_CORE
+config ARCH_SELECTS_KEXEC_FILE
+   def_bool y
+   depends on KEXEC_FILE
select KEXEC_ELF
-   help
- This is new version of kexec system call. This system call is
- file based and takes file descriptors as system call argument
- for kernel and initramfs as opposed to list of segments as
- accepted by previous system call.
-
- If you don't know what to do here, say Y.
+   select HAVE_IMA_KEXEC if IMA
 
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
-   depends on CRYPTO=y
-   depends on CRYPTO_SHA256=y
 
-config CRASH_DUMP
-   bool "Build kdump crash kernel"
-   help
- Generate crash dump after being started by kexec. This should
- be normally only set in special crash dump kernels which are
- loaded in the main kernel with kexec-tools into a specially
- reserved region and then later executed after a crash by
- kdump/kexec.
-
- For more details see Documentation/admin-guide/kdump/kdump.rst
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
 
 config COMPAT
bool "Kernel support for 32-bit U-mode"
-- 
2.31.1

[PATCH v2 06/13] loongarch/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/loongarch/Kconfig | 26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index d38b066fc931..3542bf669c78 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -481,28 +481,16 @@ config ARCH_STRICT_ALIGN
  to run kernel only on systems with h/w unaligned access support in
  order to optimise for performance.
 
-config KEXEC
-   bool "Kexec system call"
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
+config ARCH_SUPPORTS_KEXEC
+   def_bool y
 
- The name comes from the similarity to the exec system call.
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
 
-config CRASH_DUMP
-   bool "Build kdump crash kernel"
+config ARCH_SELECTS_CRASH_DUMP
+   def_bool y
+   depends on CRASH_DUMP
select RELOCATABLE
-   help
- Generate crash dump after being started by kexec. This should
- be normally only set in special crash dump kernels which are
- loaded in the main kernel with kexec-tools into a specially
- reserved region and then later executed after a crash by
- kdump/kexec.
-
- For more details see Documentation/admin-guide/kdump/kdump.rst
 
 config RELOCATABLE
bool "Relocatable kernel"
-- 
2.31.1

[PATCH v2 13/13] sh/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/sh/Kconfig | 46 --
 1 file changed, 8 insertions(+), 38 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 9652d367fc37..d52e0beed7e9 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -546,44 +546,14 @@ menu "Kernel features"
 
 source "kernel/Kconfig.hz"
 
-config KEXEC
-   bool "kexec system call (EXPERIMENTAL)"
-   depends on MMU
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.  And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
-
-config CRASH_DUMP
-   bool "kernel crash dumps (EXPERIMENTAL)"
-   depends on BROKEN_ON_SMP
-   help
- Generate crash dump after being started by kexec.
- This should be normally only set in special crash dump kernels
- which are loaded in the main kernel with kexec-tools into
- a specially reserved region and then later executed after
- a crash by kdump/kexec. The crash dump kernel must be compiled
- to a memory address not used by the main kernel using
- PHYSICAL_START.
-
- For more details see Documentation/admin-guide/kdump/kdump.rst
-
-config KEXEC_JUMP
-   bool "kexec jump (EXPERIMENTAL)"
-   depends on KEXEC && HIBERNATION
-   help
- Jump between original kernel and kexeced kernel and invoke
- code via KEXEC
+config ARCH_SUPPORTS_KEXEC
+   def_bool MMU
+
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool BROKEN_ON_SMP
+
+config ARCH_SUPPORTS_KEXEC_JUMP
+   def_bool y
 
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
-- 
2.31.1

[PATCH v2 12/13] s390/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

NOTE: The original Kconfig has a KEXEC_SIG which depends on
MODULE_SIG_FORMAT. However, attempts to keep the MODULE_SIG_FORMAT
dependency (using the strategy outlined in this series, and other
techniques) results in 'error: recursive dependency detected'
on CRYPTO. This occurs due to any path through KEXEC_SIG
attempting to select CRYPTO is ultimately dependent upon CRYPTO:

 CRYPTO
  <- ARCH_SUPPORTS_KEXEC_FILE
 <- KEXEC_FILE
<- KEXEC_SIG

Therefore, the solution is to drop the MODULE_SIG_FORMAT dependency
for KEXEC_SIG. In practice, however, MODULE_SIG_FORMAT is still
configured-in as the use of KEXEC_SIG is in step with the use of
SYSTEM_DATA_VERIFICATION, which does select MODULE_SIG_FORMAT.
Not ideal, but results in equivalent .config files for s390.

Signed-off-by: Eric DeVolder 
---
 arch/s390/Kconfig | 65 ++-
 1 file changed, 19 insertions(+), 46 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 6dab9c1be508..58dc124433ca 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -243,6 +243,25 @@ config PGTABLE_LEVELS
 
 source "kernel/livepatch/Kconfig"
 
+config ARCH_DEFAULT_KEXEC
+   def_bool y
+
+config ARCH_SUPPORTS_KEXEC
+   def_bool y
+
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool CRYPTO && CRYPTO_SHA256 && CRYPTO_SHA256_S390
+
+config ARCH_HAS_KEXEC_PURGATORY
+   def_bool KEXEC_FILE
+
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
+   help
+ Refer to  for more details on 
this.
+ This option also enables s390 zfcpdump.
+ See also 
+
 menu "Processor type and features"
 
 config HAVE_MARCH_Z10_FEATURES
@@ -481,36 +500,6 @@ config SCHED_TOPOLOGY
 
 source "kernel/Kconfig.hz"
 
-config KEXEC
-   def_bool y
-   select KEXEC_CORE
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   depends on CRYPTO
-   depends on CRYPTO_SHA256
-   depends on CRYPTO_SHA256_S390
-   help
- Enable the kexec file based system call. In contrast to the normal
- kexec system call this system call takes file descriptors for the
- kernel and initramfs as arguments.
-
-config ARCH_HAS_KEXEC_PURGATORY
-   def_bool y
-   depends on KEXEC_FILE
-
-config KEXEC_SIG
-   bool "Verify kernel signature during kexec_file_load() syscall"
-   depends on KEXEC_FILE && MODULE_SIG_FORMAT
-   help
- This option makes kernel signature verification mandatory for
- the kexec_file_load() syscall.
-
- In addition to that option, you need to enable signature
- verification for the corresponding kernel image type being
- loaded in order for this to work.
-
 config KERNEL_NOBP
def_bool n
prompt "Enable modified branch prediction for the kernel by default"
@@ -732,22 +721,6 @@ config VFIO_AP
 
 endmenu
 
-menu "Dump support"
-
-config CRASH_DUMP
-   bool "kernel crash dumps"
-   select KEXEC
-   help
- Generate crash dump after being started by kexec.
- Crash dump kernels are loaded in the main kernel with kexec-tools
- into a specially reserved region and then later executed after
- a crash by kdump/kexec.
- Refer to  for more details on 
this.
- This option also enables s390 zfcpdump.
- See also 
-
-endmenu
-
 config CCW
def_bool y
 
-- 
2.31.1

[PATCH v2 03/13] arm/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/arm/Kconfig | 29 -
 1 file changed, 4 insertions(+), 25 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0fb4b218f665..6af0105407af 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1639,20 +1639,8 @@ config XIP_DEFLATED_DATA
  copied, saving some precious ROM space. A possible drawback is a
  slightly longer boot delay.
 
-config KEXEC
-   bool "Kexec system call (EXPERIMENTAL)"
-   depends on (!SMP || PM_SLEEP_SMP)
-   depends on MMU
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.
+config ARCH_SUPPORTS_KEXEC
+   def_bool (!SMP || PM_SLEEP_SMP) && MMU
 
 config ATAGS_PROC
bool "Export atags in procfs"
@@ -1662,17 +1650,8 @@ config ATAGS_PROC
  Should the atags used to boot the kernel be exported in an "atags"
  file in procfs. Useful with kexec.
 
-config CRASH_DUMP
-   bool "Build kdump crash kernel (EXPERIMENTAL)"
-   help
- Generate crash dump after being started by kexec. This should
- be normally only set in special crash dump kernels which are
- loaded in the main kernel with kexec-tools into a specially
- reserved region and then later executed after a crash by
- kdump/kexec. The crash dump kernel must be compiled to a
- memory address not used by the main kernel
-
- For more details see Documentation/admin-guide/kdump/kdump.rst
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
 
 config AUTO_ZRELADDR
bool "Auto calculation of the decompressed kernel image address" if 
!ARCH_MULTIPLATFORM
-- 
2.31.1

[PATCH v2 10/13] powerpc/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
Reviewed-by: Sourabh Jain 
---
 arch/powerpc/Kconfig | 55 ++--
 1 file changed, 17 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bff5820b7cda..70edbda08ae3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -588,41 +588,21 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
default "y" if PPC_POWERNV
select ARCH_SUPPORTS_MEMORY_FAILURE
 
-config KEXEC
-   bool "kexec system call"
-   depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   select HAVE_IMA_KEXEC if IMA
-   select KEXEC_ELF
-   depends on PPC64
-   depends on CRYPTO=y
-   depends on CRYPTO_SHA256=y
-   help
- This is a new version of the kexec system call. This call is
- file based and takes in file descriptors as system call arguments
- for kernel and initramfs as opposed to a list of segments as is the
- case for the older kexec call.
+config ARCH_SUPPORTS_KEXEC
+   def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
+
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
 
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config ARCH_SELECTS_KEXEC_FILE
+   def_bool y
+   depends on KEXEC_FILE
+   select KEXEC_ELF
+   select HAVE_IMA_KEXEC if IMA
+
 config PPC64_BIG_ENDIAN_ELF_ABI_V2
bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
depends on PPC64 && CPU_BIG_ENDIAN
@@ -682,14 +662,13 @@ config RELOCATABLE_TEST
  loaded at, which tends to be non-zero and therefore test the
  relocation code.
 
-config CRASH_DUMP
-   bool "Build a dump capture kernel"
-   depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
+
+config ARCH_SELECTS_CRASH_DUMP
+   def_bool y
+   depends on CRASH_DUMP
select RELOCATABLE if PPC64 || 44x || PPC_85xx
-   help
- Build a kernel suitable for use as a dump capture kernel.
- The same kernel binary can be used as production kernel and dump
- capture kernel.
 
 config FA_DUMP
bool "Firmware-assisted dump"
-- 
2.31.1

[PATCH v2 04/13] ia64/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/ia64/Kconfig | 28 +---
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 21fa63ce5ffc..df54a038e6da 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -360,31 +360,13 @@ config IA64_HP_AML_NFW
  the "force" module parameter, e.g., with the "aml_nfw.force"
  kernel command line option.
 
-config KEXEC
-   bool "kexec system call"
-   depends on !SMP || HOTPLUG_CPU
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
+endmenu
 
-config CRASH_DUMP
- bool "kernel crash dumps"
- depends on IA64_MCA_RECOVERY && (!SMP || HOTPLUG_CPU)
- help
-   Generate crash dump after being started by kexec.
+config ARCH_SUPPORTS_KEXEC
+   def_bool !SMP || HOTPLUG_CPU
 
-endmenu
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool IA64_MCA_RECOVERY && (!SMP || HOTPLUG_CPU)
 
 menu "Power management and ACPI options"
 
-- 
2.31.1

[PATCH v2 08/13] mips/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/mips/Kconfig | 32 +---
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 675a8660cb85..3d9960942cbd 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2873,33 +2873,11 @@ config HZ
 config SCHED_HRTICK
def_bool HIGH_RES_TIMERS
 
-config KEXEC
-   bool "Kexec system call"
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
-
-config CRASH_DUMP
-   bool "Kernel crash dumps"
-   help
- Generate crash dump after being started by kexec.
- This should be normally only set in special crash dump kernels
- which are loaded in the main kernel with kexec-tools into
- a specially reserved region and then later executed after
- a crash by kdump/kexec. The crash dump kernel must be compiled
- to a memory address not used by the main kernel or firmware using
- PHYSICAL_START.
+config ARCH_SUPPORTS_KEXEC
+   def_bool y
+
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
 
 config PHYSICAL_START
hex "Physical address where the kernel is loaded"
-- 
2.31.1

[PATCH v2 09/13] parisc/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/parisc/Kconfig | 34 +++---
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 967bde65dd0e..8de24bc503aa 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -348,29 +348,17 @@ config NR_CPUS
default "4" if 64BIT
default "16"
 
-config KEXEC
-   bool "Kexec system call"
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- It is an ongoing process to be certain the hardware in a machine
- shutdown, so do not be surprised if this code does not
- initially work for you.
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   select KEXEC_ELF
-   help
- This enables the kexec_file_load() System call. This is
- file based and takes file descriptors as system call argument
- for kernel and initramfs as opposed to list of segments as
- accepted by previous system call.
-
 endmenu
 
+config ARCH_SUPPORTS_KEXEC
+   def_bool y
+
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool y
+
+config ARCH_SELECTS_KEXEC_FILE
+   def_bool y
+   depends on KEXEC_FILE
+   select KEXEC_ELF
+
 source "drivers/parisc/Kconfig"
-- 
2.31.1

[PATCH v2 05/13] arm64/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/arm64/Kconfig | 62 +-
 1 file changed, 12 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 343e1e1cae10..dfe47efa7cc1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1433,60 +1433,22 @@ config PARAVIRT_TIME_ACCOUNTING
 
  If in doubt, say N here.
 
-config KEXEC
-   depends on PM_SLEEP_SMP
-   select KEXEC_CORE
-   bool "kexec system call"
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   select HAVE_IMA_KEXEC if IMA
-   help
- This is new version of kexec system call. This system call is
- file based and takes file descriptors as system call argument
- for kernel and initramfs as opposed to list of segments as
- accepted by previous system call.
-
-config KEXEC_SIG
-   bool "Verify kernel signature during kexec_file_load() syscall"
-   depends on KEXEC_FILE
-   help
- Select this option to verify a signature with loaded kernel
- image. If configured, any attempt of loading a image without
- valid signature will fail.
-
- In addition to that option, you need to enable signature
- verification for the corresponding kernel image type being
- loaded in order for this to work.
+config ARCH_SUPPORTS_KEXEC
+   def_bool PM_SLEEP_SMP
 
-config KEXEC_IMAGE_VERIFY_SIG
-   bool "Enable Image signature verification support"
-   default y
-   depends on KEXEC_SIG
-   depends on EFI && SIGNED_PE_FILE_VERIFICATION
-   help
- Enable Image signature verification support.
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool y
 
-comment "Support for PE file signature verification disabled"
-   depends on KEXEC_SIG
-   depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
+config ARCH_SELECTS_KEXEC_FILE
+   def_bool y
+   depends on KEXEC_FILE
+   select HAVE_IMA_KEXEC if IMA
 
-config CRASH_DUMP
-   bool "Build kdump crash kernel"
-   help
- Generate crash dump after being started by kexec. This should
- be normally only set in special crash dump kernels which are
- loaded in the main kernel with kexec-tools into a specially
- reserved region and then later executed after a crash by
- kdump/kexec.
+config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG
+   def_bool y
 
- For more details see Documentation/admin-guide/kdump/kdump.rst
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool y
 
 config TRANS_TABLE
def_bool y
-- 
2.31.1

[PATCH v2 07/13] m68k/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
Reviewed-by: Geert Uytterhoeven 
Acked-by: Geert Uytterhoeven 
---
 arch/m68k/Kconfig | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 40198a1ebe27..7b71916d1519 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -88,23 +88,8 @@ config MMU_SUN3
bool
depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE
 
-config KEXEC
-   bool "kexec system call"
-   depends on M68KCLASSIC && MMU
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
+config ARCH_SUPPORTS_KEXEC
+   def_bool M68KCLASSIC && MMU
 
 config BOOTINFO_PROC
bool "Export bootinfo in procfs"
-- 
2.31.1

[PATCH v2 00/13] refactor Kconfig to consolidate KEXEC and CRASH options

2023-06-19 Thread Eric DeVolder

The Kconfig is refactored to consolidate KEXEC and CRASH options from
various arch//Kconfig files into new file kernel/Kconfig.kexec.

The Kconfig.kexec is now a submenu titled "Kexec and crash features"
located under "General Setup".

The following options are impacted:

 - KEXEC
 - KEXEC_FILE
 - KEXEC_SIG
 - KEXEC_SIG_FORCE
 - KEXEC_BZIMAGE_VERIFY_SIG
 - KEXEC_JUMP
 - CRASH_DUMP

Over time, these options have been copied between Kconfig files and
are very similar to one another, but with slight differences.

The following architectures are impacted by the refactor (because of
use of one or more KEXEC/CRASH options):

 - arm
 - arm64
 - ia64
 - loongarch
 - m68k
 - mips
 - parisc
 - powerpc
 - riscv
 - s390
 - sh
 - x86 

More information:

In the patch series "crash: Kernel handling of CPU and memory hot
un/plug"

 https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devol...@oracle.com/

the new kernel feature introduces the config option CRASH_HOTPLUG.

In reviewing, Thomas Gleixner requested that the new config option
not be placed in x86 Kconfig. Rather the option needs a generic/common
home. To Thomas' point, the KEXEC and CRASH options have largely been
duplicated in the various arch//Kconfig files, with minor
differences. This kind of proliferation is to be avoid/stopped.

 https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/

To that end, I have refactored the arch Kconfigs so as to consolidate
the various KEXEC and CRASH options. Generally speaking, this work has
the following themes:

- KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
  - These items from arch/Kconfig:
  CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
  - These items from arch/x86/Kconfig form the common options:
  KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
  KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
  - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
  NOTE: PHYSICAL_START could be argued to be included in this series.
- The Kconfig.kexec is now a submenu titled "Kexec and crash features"
- The Kconfig.kexec is now listed in "General Setup" submenu from
  init/Kconfig
- To control the main common options, new options ARCH_SUPPORTS_KEXEC,
  ARCH_SUPPORTS_KEXEC_FILE and ARCH_SUPPORTS_CRASH_DUMP are introduced.
  NOTE: The existing ARCH_HAS_KEXEC_PURGATORY remains unchanged.
- To account for the slight differences, new options ARCH_SELECTS_KEXEC,
  ARCH_SELECTS_KEXEC_FILE and ARCH_SELECTS_CRASH_DUMP are used to
  elicit the same side effects as the original arch//Kconfig
  files for KEXEC and CRASH options.

An example, 'make menuconfig' illustrating the submenu:

  > General setup > Kexec and crash features
  [*] Enable kexec system call
  [*] Enable kexec file based system call
  [*]   Verify kernel signature during kexec_file_load() syscall
  [ ] Require a valid signature in kexec_file_load() syscall
  [ ] Enable bzImage signature verification support
  [*] kexec jump
  [*] kernel crash dumps
  [*]   Update the crash elfcorehdr on system configuration changes

The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP. In the
process of consolidating these options, I encountered slight differences
in the coding of these options in several of the architectures. As a
result, I settled on the following solution:

- Each of three main options has a 'depends on ARCH_SUPPORTS_'
  statement: ARCH_SUPPORTS_KEXEC, ARCH_SUPPORTS_KEXEC_FILE,
  ARCH_SUPPORTS_CRASH_DUMP.

  For example, the KEXEC_FILE option has a 'depends on
  ARCH_SUPPORTS_KEXEC_FILE' statement.

- The boolean ARCH_SUPPORTS_ in effect allows the arch to
  determine when the feature is allowed.  Archs which don't have the
  feature simply do not provide the corresponding ARCH_SUPPORTS_.
  For each arch, where there previously were KEXEC and/or CRASH
  options, these have been replaced with the corresponding boolean
  ARCH_SUPPORTS_, and an appropriate def_bool statement.

  For example, if the arch supports KEXEC_FILE, then the
  ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits the
  KEXEC_FILE option to be available.

  If the arch has a 'depends on' statement in its original coding
  of the option, then that expression becomes part of the def_bool
  expression. For example, arm64 had:

  config KEXEC
depends on PM_SLEEP_SMP

  and in this solution, this converts to:

  config ARCH_SUPPORTS_KEXEC
def_bool PM_SLEEP_SMP


- In order to account for the differences in the config coding for
  the three common options, the ARCH_SELECTS_ is used.
  This options has a 'depends on ' statement to couple it
  to the main option, and from there can insert the differences
  from the common option and the arch original coding of that option.

  For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
  KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
  'select CRYPTO' and 'select CRYPTO_SHA256' statements.

Illustrating the option relationships:

For KEXEC:
 ARCH_SUPPORTS_KEXEC <- KEXEC <-

[PATCH v2 02/13] x86/kexec: refactor for kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
 arch/x86/Kconfig | 89 +++-
 1 file changed, 13 insertions(+), 76 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..1afc6ca2986b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2043,88 +2043,25 @@ config EFI_RUNTIME_MAP
 
 source "kernel/Kconfig.hz"
 
-config KEXEC
-   bool "kexec system call"
-   select KEXEC_CORE
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
- The name comes from the similarity to the exec system call.
-
- It is an ongoing process to be certain the hardware in a machine
- is properly shutdown, so do not be surprised if this code does not
- initially work for you.  As of this writing the exact hardware
- interface is strongly in flux, so no good recommendation can be
- made.
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   select HAVE_IMA_KEXEC if IMA
-   depends on X86_64
-   depends on CRYPTO=y
-   depends on CRYPTO_SHA256=y
-   help
- This is new version of kexec system call. This system call is
- file based and takes file descriptors as system call argument
- for kernel and initramfs as opposed to list of segments as
- accepted by previous system call.
+config ARCH_SUPPORTS_KEXEC
+   def_bool y
 
-config ARCH_HAS_KEXEC_PURGATORY
-   def_bool KEXEC_FILE
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool X86_64 && CRYPTO && CRYPTO_SHA256
 
-config KEXEC_SIG
-   bool "Verify kernel signature during kexec_file_load() syscall"
+config ARCH_SELECTS_KEXEC_FILE
+   def_bool y
depends on KEXEC_FILE
-   help
-
- This option makes the kexec_file_load() syscall check for a valid
- signature of the kernel image.  The image can still be loaded without
- a valid signature unless you also enable KEXEC_SIG_FORCE, though if
- there's a signature that we can check, then it must be valid.
-
- In addition to this option, you need to enable signature
- verification for the corresponding kernel image type being
- loaded in order for this to work.
-
-config KEXEC_SIG_FORCE
-   bool "Require a valid signature in kexec_file_load() syscall"
-   depends on KEXEC_SIG
-   help
- This option makes kernel signature verification mandatory for
- the kexec_file_load() syscall.
+   select HAVE_IMA_KEXEC if IMA
 
-config KEXEC_BZIMAGE_VERIFY_SIG
-   bool "Enable bzImage signature verification support"
-   depends on KEXEC_SIG
-   depends on SIGNED_PE_FILE_VERIFICATION
-   select SYSTEM_TRUSTED_KEYRING
-   help
- Enable bzImage signature verification support.
+config ARCH_HAS_KEXEC_PURGATORY
+   def_bool KEXEC_FILE
 
-config CRASH_DUMP
-   bool "kernel crash dumps"
-   depends on X86_64 || (X86_32 && HIGHMEM)
-   help
- Generate crash dump after being started by kexec.
- This should be normally only set in special crash dump kernels
- which are loaded in the main kernel with kexec-tools into
- a specially reserved region and then later executed after
- a crash by kdump/kexec. The crash dump kernel must be compiled
- to a memory address not used by the main kernel or BIOS using
- PHYSICAL_START, or it must be built as a relocatable image
- (CONFIG_RELOCATABLE=y).
- For more details see Documentation/admin-guide/kdump/kdump.rst
+config ARCH_SUPPORTS_KEXEC_JUMP
+   def_bool y
 
-config KEXEC_JUMP
-   bool "kexec jump"
-   depends on KEXEC && HIBERNATION
-   help
- Jump between original kernel and kexeced kernel and invoke
- code in physical address mode via KEXEC
+config ARCH_SUPPORTS_CRASH_DUMP
+   def_bool X86_64 || (X86_32 && HIGHMEM)
 
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || 
CRASH_DUMP)
-- 
2.31.1

[PATCH v2 01/13] kexec: consolidate kexec and crash options into kernel/Kconfig.kexec

2023-06-19 Thread Eric DeVolder

The config options for kexec and crash features are consolidated
into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
is a new submenu "Kexec and crash handling" where all the kexec and
crash options that were once in the arch-dependent submenu "Processor
type and features" are now consolidated.

The following options are impacted:

 - KEXEC
 - KEXEC_FILE
 - KEXEC_SIG
 - KEXEC_SIG_FORCE
 - KEXEC_BZIMAGE_VERIFY_SIG
 - KEXEC_JUMP
 - CRASH_DUMP

The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.

Architectures specify support of certain KEXEC and CRASH features with
similarly named new ARCH_SUPPORTS_ config options.

Architectures can utilize the new ARCH_SELECTS_ config
options to specify additional components when  is enabled.

To summarize, the ARCH_SUPPORTS_ permits the  to be
enabled, and the ARCH_SELECTS_ handles side effects (ie.
select statements).

Signed-off-by: Eric DeVolder 
---
 arch/Kconfig |  13 -
 init/Kconfig |   2 +
 kernel/Kconfig.kexec | 110 +++
 3 files changed, 112 insertions(+), 13 deletions(-)
 create mode 100644 kernel/Kconfig.kexec

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..a37730679730 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,19 +11,6 @@ source "arch/$(SRCARCH)/Kconfig"
 
 menu "General architecture-dependent options"
 
-config CRASH_CORE
-   bool
-
-config KEXEC_CORE
-   select CRASH_CORE
-   bool
-
-config KEXEC_ELF
-   bool
-
-config HAVE_IMA_KEXEC
-   bool
-
 config ARCH_HAS_SUBPAGE_FAULTS
bool
help
diff --git a/init/Kconfig b/init/Kconfig
index 32c24950c4ce..4424447e23a5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1917,6 +1917,8 @@ config BINDGEN_VERSION_TEXT
 config TRACEPOINTS
bool
 
+source "kernel/Kconfig.kexec"
+
 endmenu# General setup
 
 source "arch/Kconfig"
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
new file mode 100644
index ..5d576ddfd999
--- /dev/null
+++ b/kernel/Kconfig.kexec
@@ -0,0 +1,110 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "Kexec and crash features"
+
+config CRASH_CORE
+   bool
+
+config KEXEC_CORE
+   select CRASH_CORE
+   bool
+
+config KEXEC_ELF
+   bool
+
+config HAVE_IMA_KEXEC
+   bool
+
+config KEXEC
+   bool "Enable kexec system call"
+   default ARCH_DEFAULT_KEXEC
+   depends on ARCH_SUPPORTS_KEXEC
+   select KEXEC_CORE
+   help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
+ It is an ongoing process to be certain the hardware in a machine
+ is properly shutdown, so do not be surprised if this code does not
+ initially work for you. As of this writing the exact hardware
+ interface is strongly in flux, so no good recommendation can be
+ made.
+
+config KEXEC_FILE
+   bool "Enable kexec file based system call"
+   depends on ARCH_SUPPORTS_KEXEC_FILE
+   select KEXEC_CORE
+   help
+ This is new version of kexec system call. This system call is
+ file based and takes file descriptors as system call argument
+ for kernel and initramfs as opposed to list of segments as
+ accepted by kexec system call.
+
+config KEXEC_SIG
+   bool "Verify kernel signature during kexec_file_load() syscall"
+   depends on KEXEC_FILE
+   help
+ This option makes the kexec_file_load() syscall check for a valid
+ signature of the kernel image. The image can still be loaded without
+ a valid signature unless you also enable KEXEC_SIG_FORCE, though if
+ there's a signature that we can check, then it must be valid.
+
+ In addition to this option, you need to enable signature
+ verification for the corresponding kernel image type being
+ loaded in order for this to work.
+
+config KEXEC_SIG_FORCE
+   bool "Require a valid signature in kexec_file_load() syscall"
+   depends on KEXEC_SIG
+   help
+ This option makes kernel signature verification mandatory for
+ the kexec_file_load() syscall.
+
+config KEXEC_IMAGE_VERIFY_SIG
+   bool "Enable Image signature verification support"
+   default ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG
+   depends on KEXEC_SIG
+   depends on EFI && SIGNED_PE_FILE_VERIFICATION
+   help
+ Enable Image signature verification support.
+
+config KEXEC_BZIMAGE_VERIFY_SIG
+   bool "Enable bzImage signature verification support"
+   depends on KEXEC_SIG
+   depends on SIGNED_PE_FILE_VERIFICATION
+   select SYSTEM_TRUSTED_KEYRING
+   help
+ Enable bzImage signature

Re: [PATCH] ASoC: imx-audmix: check return value of devm_kasprintf()

2023-06-19 Thread Mark Brown

On Wed, 14 Jun 2023 15:15:09 +0300, Claudiu Beznea wrote:
> devm_kasprintf() returns a pointer to dynamically allocated memory.
> Pointer could be NULL in case allocation fails. Check pointer validity.
> Identified with coccinelle (kmerr.cocci script).
> 
> 

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: imx-audmix: check return value of devm_kasprintf()
  commit: 2f76e1d6ca524a888d29aafe29f2ad2003857971

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc()

2023-06-19 Thread Nadav Amit




> On Jun 19, 2023, at 10:09 AM, Andy Lutomirski  wrote:
> 
> But jit_text_alloc() can't do this, because the order of operations doesn't 
> match.  With jit_text_alloc(), the executable mapping shows up before the 
> text is populated, so there is no atomic change from not-there to 
> populated-and-executable.  Which means that there is an opportunity for CPUs, 
> speculatively or otherwise, to start filling various caches with intermediate 
> states of the text, which means that various architectures (even x86!) may 
> need serialization.
> 
> For eBPF- and module- like use cases, where JITting/code gen is quite 
> coarse-grained, perhaps something vaguely like:
> 
> jit_text_alloc() -> returns a handle and an executable virtual address, but 
> does *not* map it there
> jit_text_write() -> write to that handle
> jit_text_map() -> map it and synchronize if needed (no sync needed on x86, I 
> think)

Andy, would you mind explaining why you think a sync is not needed? I mean I 
have a “feeling” that perhaps TSO can guarantee something based on the order of 
write and page-table update. Is that the argument?

On this regard, one thing that I clearly do not understand is why *today* it is 
ok for users of bpf_arch_text_copy() not to call text_poke_sync(). Am I missing 
something?

Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc()

2023-06-19 Thread Andy Lutomirski

On Sun, Jun 18, 2023, at 1:00 AM, Mike Rapoport wrote:
> On Sat, Jun 17, 2023 at 01:38:29PM -0700, Andy Lutomirski wrote:
>> On Fri, Jun 16, 2023, at 1:50 AM, Mike Rapoport wrote:
>> > From: "Mike Rapoport (IBM)" 
>> >
>> > module_alloc() is used everywhere as a mean to allocate memory for code.
>> >
>> > Beside being semantically wrong, this unnecessarily ties all subsystems
>> > that need to allocate code, such as ftrace, kprobes and BPF to modules
>> > and puts the burden of code allocation to the modules code.
>> >
>> > Several architectures override module_alloc() because of various
>> > constraints where the executable memory can be located and this causes
>> > additional obstacles for improvements of code allocation.
>> >
>> > Start splitting code allocation from modules by introducing
>> > execmem_text_alloc(), execmem_free(), jit_text_alloc(), jit_free() APIs.
>> >
>> > Initially, execmem_text_alloc() and jit_text_alloc() are wrappers for
>> > module_alloc() and execmem_free() and jit_free() are replacements of
>> > module_memfree() to allow updating all call sites to use the new APIs.
>> >
>> > The intention semantics for new allocation APIs:
>> >
>> > * execmem_text_alloc() should be used to allocate memory that must reside
>> >   close to the kernel image, like loadable kernel modules and generated
>> >   code that is restricted by relative addressing.
>> >
>> > * jit_text_alloc() should be used to allocate memory for generated code
>> >   when there are no restrictions for the code placement. For
>> >   architectures that require that any code is within certain distance
>> >   from the kernel image, jit_text_alloc() will be essentially aliased to
>> >   execmem_text_alloc().
>> >
>> 
>> Is there anything in this series to help users do the appropriate
>> synchronization when the actually populate the allocated memory with
>> code?  See here, for example:
>
> This series only factors out the executable allocations from modules and
> puts them in a central place.
> Anything else would go on top after this lands.

Hmm.

On the one hand, there's nothing wrong with factoring out common code. On the 
other hand, this is probably the right time to at least start thinking about 
synchronization, at least to the extent that it might make us want to change 
this API.  (I'm not at all saying that this series should require changes -- 
I'm just saying that this is a good time to think about how this should work.)

The current APIs, *and* the proposed jit_text_alloc() API, don't actually look 
like the one think in the Linux ecosystem that actually intelligently and 
efficiently maps new text into an address space: mmap().

On x86, you can mmap() an existing file full of executable code PROT_EXEC and 
jump to it with minimal synchronization (just the standard implicit ordering in 
the kernel that populates the pages before setting up the PTEs and whatever 
user synchronization is needed to avoid jumping into the mapping before mmap() 
finishes).  It works across CPUs, and the only possible way userspace can screw 
it up (for a read-only mapping of read-only text, anyway) is to jump to the 
mapping too early, in which case userspace gets a page fault.  Incoherence is 
impossible, and no one needs to "serialize" (in the SDM sense).

I think the same sequence (from userspace's perspective) works on other 
architectures, too, although I think more cache management is needed on the 
kernel's end.  As far as I know, no Linux SMP architecture needs an IPI to map 
executable text into usermode, but I could easily be wrong.  (IIRC RISC-V has 
very developer-unfriendly icache management, but I don't remember the details.)

Of course, using ptrace or any other FOLL_FORCE to modify text on x86 is rather 
fraught, and I bet many things do it wrong when userspace is multithreaded.  
But not in production because it's mostly not used in production.)

But jit_text_alloc() can't do this, because the order of operations doesn't 
match.  With jit_text_alloc(), the executable mapping shows up before the text 
is populated, so there is no atomic change from not-there to 
populated-and-executable.  Which means that there is an opportunity for CPUs, 
speculatively or otherwise, to start filling various caches with intermediate 
states of the text, which means that various architectures (even x86!) may need 
serialization.

For eBPF- and module- like use cases, where JITting/code gen is quite 
coarse-grained, perhaps something vaguely like:

jit_text_alloc() -> returns a handle and an executable virtual address, but 
does *not* map it there
jit_text_write() -> write to that handle
jit_text_map() -> map it and synchronize if needed (no sync needed on x86, I 
think)

could be more efficient and/or safer.

(Modules could use this too.  Getting alternatives right might take some 
fiddling, because off the top of my head, this doesn't match how it works now.)

To make alternatives easier, this could work, maybe (haven't fully

Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter

2023-06-19 Thread David Hildenbrand


On 19.06.23 18:17, Aneesh Kumar K.V wrote:

David Hildenbrand  writes:


On 09.06.23 08:08, Aneesh Kumar K.V wrote:

Certain devices can possess non-standard memory capacities, not constrained
to multiples of 1GB. Provide a kernel parameter so that we can map the
device memory completely on memory hotplug.


So, the unfortunate thing is that these devices would have worked out of
the box before the memory block size was increased from 256 MiB to 1 GiB
in these setups. Now, one has to fine-tune the memory block size. The
only other arch that I know, which supports setting the memory block
size, is x86 for special (large) UV systems -- and at least in the past
128 MiB vs. 2 GiB memory blocks made a performance difference during
boot (maybe no longer today, who knows).


Obviously, less tunable and getting stuff simply working out of the box
is preferable.

Two questions:

1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
these setups, to avoid specifying these parameters?


The patch does try to detect as much as possible by looking at device tree
nodes and aperture window size. But there are still cases where we find
a memory aperture of size X GB and device driver hotplug X.YGB memory.



Okay, and I assume we can't detect that case easily.

Which interface is that device driver using to hotplug memory? It's 
quite surprising I have to say ...




2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
x86-64, experiments (with direct map fragmentation) showed that the
effective performance boost is pretty insignificant, so I wonder how big
the 1 GiB direct map performance improvement is.



Tarun is running some tests to evaluate the impact. We used to use 1GiB
mapping always. This was later switched to use memory block size to fix
issues with memory unplug
commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged 
memory")
explains some details related to that change.



IIUC, that commit (conditionally) increased the memory block size to 
avoid the splitting, correct? By that, it broke the device driver use case.







I guess the only real issue with 256 MiB memory blocks and 1 GiB direct
mapping is memory unplug of boot memory: when unplugging a 256 MiB
block, one would have to remap the 1 GiB range using 2 MiB ranges.




... I was wondering what would happen if you simply leave the direct
mapping in this corner case in place instead of doing this remapping.
IOW, remove the memory but keep the direct map pointing at the removed
memory. Nobody should be touching it, or are there any cases where that
could hurt?


Or is there any other reason why we really want 1 GiB memory blocks
instead of to defaulting to 256 MiB the way it used to be?



The idea we are working towards is to keep the memory block size small


That would be preferable, yes ...


but map the boot memory using 1G. An unplug request can split that 1G
mapping later. We could look at the possibility of leaving that mapping
without splitting. But not sure why we would want to do that if we can
correctly split things. Right now there is no splitting support in powerpc.


If splitting over-complicates the matter (and well, it will even consume 
more memory), it might at least be worth looking into that. Yes, it's 
cleaner.


I think there is also the option to fail memory offlining (and therefore 
unplug) if we have a 1 GiB mapping and don't want to split. For 
hotplugged memory it would always work to unplug again. aarch64 blocks 
any boot memory from getting unplugged.


But I guess that might break existing use cases (unplug boot memory) on 
ppc64 that rely on ZONE_MOVABLE to have it working with guarantees, 
right? Could be optimized but not sure if that's the best approach.



--
Cheers,

David / dhildenb

Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter

2023-06-19 Thread Aneesh Kumar K.V

David Hildenbrand  writes:

> On 09.06.23 08:08, Aneesh Kumar K.V wrote:
>> Certain devices can possess non-standard memory capacities, not constrained
>> to multiples of 1GB. Provide a kernel parameter so that we can map the
>> device memory completely on memory hotplug.
>
> So, the unfortunate thing is that these devices would have worked out of 
> the box before the memory block size was increased from 256 MiB to 1 GiB 
> in these setups. Now, one has to fine-tune the memory block size. The 
> only other arch that I know, which supports setting the memory block 
> size, is x86 for special (large) UV systems -- and at least in the past 
> 128 MiB vs. 2 GiB memory blocks made a performance difference during 
> boot (maybe no longer today, who knows).
>
>
> Obviously, less tunable and getting stuff simply working out of the box 
> is preferable.
>
> Two questions:
>
> 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in 
> these setups, to avoid specifying these parameters?

The patch does try to detect as much as possible by looking at device tree
nodes and aperture window size. But there are still cases where we find
a memory aperture of size X GB and device driver hotplug X.YGB memory.

>
> 2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On 
> x86-64, experiments (with direct map fragmentation) showed that the 
> effective performance boost is pretty insignificant, so I wonder how big 
> the 1 GiB direct map performance improvement is.

Tarun is running some tests to evaluate the impact. We used to use 1GiB
mapping always. This was later switched to use memory block size to fix
issues with memory unplug
commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for 
hot-plugged memory")
explains some details related to that change.

>
>
> I guess the only real issue with 256 MiB memory blocks and 1 GiB direct 
> mapping is memory unplug of boot memory: when unplugging a 256 MiB 
> block, one would have to remap the 1 GiB range using 2 MiB ranges.

>
> ... I was wondering what would happen if you simply leave the direct 
> mapping in this corner case in place instead of doing this remapping. 
> IOW, remove the memory but keep the direct map pointing at the removed 
> memory. Nobody should be touching it, or are there any cases where that 
> could hurt?
>
>
> Or is there any other reason why we really want 1 GiB memory blocks 
> instead of to defaulting to 256 MiB the way it used to be?
>

The idea we are working towards is to keep the memory block size small
but map the boot memory using 1G. An unplug request can split that 1G
mapping later. We could look at the possibility of leaving that mapping
without splitting. But not sure why we would want to do that if we can
correctly split things. Right now there is no splitting support in powerpc.

-aneesh

Re: [PATCH v2 06/12] mm/execmem: introduce execmem_data_alloc()

2023-06-19 Thread Mike Rapoport

On Mon, Jun 19, 2023 at 12:32:55AM +0200, Thomas Gleixner wrote:
> Mike!
> 
> Sorry for being late on this ...
> 
> On Fri, Jun 16 2023 at 11:50, Mike Rapoport wrote:
> 
> The fact that my suggestions had a 'mod_' namespace prefix does not make
> any of my points moot.

The prefix does not matter. What matters is what we are trying to abstract.
Your suggestion is based of the memory used by modules. I'm abstracting
address spaces for different types of executable and related memory. They
are similar, yes, but they are not the same.

The TEXT, INIT_TEXT and *_DATA do not match to what we have from arch POV.
They have modules with text, rw data, ro data and ro after init data and
the memory for the generated code. The memory for modules and memory for
other users have different restrictions for their placement, so using a
single TEXT type for them is semantically wrong. BPF and kprobes do not
necessarily must be at the same address range as modules and init text does
not differ from normal text.

> Song did an extremly good job in abstracting things out, but you decided
> to ditch his ground work instead of building on it and keeping the good
> parts. That's beyond sad.

Actually not. The core idea to describe address range suitable for code
allocations with a structure and have arch code initialize this structure
at boot and be done with it is the same. But I don't think vmalloc
parameters belong there, they should be completely encapsulated in the
allocator. Having fallback range named explicitly is IMO clearer than an
array of address spaces.

I accept your point that the structures describing ranges for different
types should be unified and I've got carried away with making the wrappers
to convert that structure to parameters to the core allocation function.

I've chosen to define ranges as fields in the containing structure rather
than enum with types and an array because I strongly feel that the callers
should not care about these parameters. These parameters are defined by
architecture and the callers should not need to know how each and every
arch defines restrictions suitable for modules, bpf or kprobes.

That's also the reason to have different names for API calls, exactly to
avoid having alloc(KPROBES,...), alloc(BPF, ...), alloc(MODULES, ...) an so
on.

All in all, if I filter all the ranting, this boils down to having a
unified structure for all the address ranges and passing this structure
from the wrappers to the core alloc as is rather that translating it to
separate parameters, with which I agree.

> Thanks,
> 
> tglx

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc()

2023-06-19 Thread Kent Overstreet

On Sat, Jun 17, 2023 at 01:38:29PM -0700, Andy Lutomirski wrote:
> On Fri, Jun 16, 2023, at 1:50 AM, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" 
> >
> > module_alloc() is used everywhere as a mean to allocate memory for code.
> >
> > Beside being semantically wrong, this unnecessarily ties all subsystems
> > that need to allocate code, such as ftrace, kprobes and BPF to modules
> > and puts the burden of code allocation to the modules code.
> >
> > Several architectures override module_alloc() because of various
> > constraints where the executable memory can be located and this causes
> > additional obstacles for improvements of code allocation.
> >
> > Start splitting code allocation from modules by introducing
> > execmem_text_alloc(), execmem_free(), jit_text_alloc(), jit_free() APIs.
> >
> > Initially, execmem_text_alloc() and jit_text_alloc() are wrappers for
> > module_alloc() and execmem_free() and jit_free() are replacements of
> > module_memfree() to allow updating all call sites to use the new APIs.
> >
> > The intention semantics for new allocation APIs:
> >
> > * execmem_text_alloc() should be used to allocate memory that must reside
> >   close to the kernel image, like loadable kernel modules and generated
> >   code that is restricted by relative addressing.
> >
> > * jit_text_alloc() should be used to allocate memory for generated code
> >   when there are no restrictions for the code placement. For
> >   architectures that require that any code is within certain distance
> >   from the kernel image, jit_text_alloc() will be essentially aliased to
> >   execmem_text_alloc().
> >
> 
> Is there anything in this series to help users do the appropriate 
> synchronization when the actually populate the allocated memory with code?  
> See here, for example:
> 
> https://lore.kernel.org/linux-fsdevel/cb6533c6-cea0-4f04-95cf-b8240c6ab...@app.fastmail.com/T/#u

We're still in need of an arch independent text_poke() api.

Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter

2023-06-19 Thread David Hildenbrand


On 09.06.23 08:08, Aneesh Kumar K.V wrote:

Certain devices can possess non-standard memory capacities, not constrained
to multiples of 1GB. Provide a kernel parameter so that we can map the
device memory completely on memory hotplug.


So, the unfortunate thing is that these devices would have worked out of 
the box before the memory block size was increased from 256 MiB to 1 GiB 
in these setups. Now, one has to fine-tune the memory block size. The 
only other arch that I know, which supports setting the memory block 
size, is x86 for special (large) UV systems -- and at least in the past 
128 MiB vs. 2 GiB memory blocks made a performance difference during 
boot (maybe no longer today, who knows).



Obviously, less tunable and getting stuff simply working out of the box 
is preferable.


Two questions:

1) Isn't there a way to improve auto-detection to fallback to 256 MiB in 
these setups, to avoid specifying these parameters?


2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On 
x86-64, experiments (with direct map fragmentation) showed that the 
effective performance boost is pretty insignificant, so I wonder how big 
the 1 GiB direct map performance improvement is.



I guess the only real issue with 256 MiB memory blocks and 1 GiB direct 
mapping is memory unplug of boot memory: when unplugging a 256 MiB 
block, one would have to remap the 1 GiB range using 2 MiB ranges.


... I was wondering what would happen if you simply leave the direct 
mapping in this corner case in place instead of doing this remapping. 
IOW, remove the memory but keep the direct map pointing at the removed 
memory. Nobody should be touching it, or are there any cases where that 
could hurt?



Or is there any other reason why we really want 1 GiB memory blocks 
instead of to defaulting to 256 MiB the way it used to be?


Thanks!



Restrict memory_block_size value to a power of 2 value similar to LMB size.
The memory block size should also be more than the section size.

Signed-off-by: Aneesh Kumar K.V 
---
  .../admin-guide/kernel-parameters.txt |  3 +++
  arch/powerpc/kernel/setup_64.c| 23 +++
  arch/powerpc/mm/init_64.c | 17 ++
  3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..833b8c5b4b4c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3190,6 +3190,9 @@
Note that even when enabled, there are a few cases where
the feature is not effective.
  
+	memory_block_size=size [PPC]

+Use this parameter to configure the memory block size 
value.
+
memtest=[KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
Format: 
default : 0 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 246201d0d879..cbdb924462c7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -892,6 +892,29 @@ unsigned long memory_block_size_bytes(void)
  
  	return MIN_MEMORY_BLOCK_SIZE;

  }
+
+/*
+ * Restrict to a power of 2 value for memblock which is larger than
+ * section size
+ */
+static int __init parse_mem_block_size(char *ptr)
+{
+   unsigned int order;
+   unsigned long size = memparse(ptr, NULL);
+
+   order = fls64(size);
+   if (!order)
+   return 0;
+
+   order--;
+   if (order < SECTION_SIZE_BITS)
+   return 0;
+
+   memory_block_size = 1UL << order;
+
+   return 0;
+}
+early_param("memory_block_size", parse_mem_block_size);
  #endif
  
  #if defined(CONFIG_PPC_INDIRECT_PIO) || defined(CONFIG_PPC_INDIRECT_MMIO)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 97a9163f1280..5e6dde593ea3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -549,13 +549,20 @@ static int __init probe_memory_block_size(unsigned long 
node, const char *uname,
return 0;
  }
  
-/*

- * start with 1G memory block size. Early init will
- * fix this with correct value.
- */
-unsigned long memory_block_size __ro_after_init = 1UL << 30;
+unsigned long memory_block_size __ro_after_init;
  static void __init early_init_memory_block_size(void)
  {
+   /*
+* if it is set via early param just return.
+*/
+   if (memory_block_size)
+   return;
+
+   /*
+* start with 1G memory block size. update_memory_block_size()
+* will derive the right value based on device tree details.
+*/
+   memory_block_size = 1UL << 30;
/*
 * We need to do memory_block_size probe early so that
 * radix__early_init_mmu() can use this as limit for


--
Cheers,

David / dhildenb

Re: [PATCH v2 4/6] watchdog/hardlockup: Make HAVE_NMI_WATCHDOG sparc64-specific

2023-06-19 Thread Petr Mladek

On Fri 2023-06-16 09:48:06, Doug Anderson wrote:
> Hi,
> 
> On Fri, Jun 16, 2023 at 8:07 AM Petr Mladek  wrote:
> >
> > There are several hardlockup detector implementations and several Kconfig
> > values which allow selection and build of the preferred one.
[...]
> > Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
> > and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
> > HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
> > and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
> >
> > Signed-off-by: Petr Mladek 
> >
> > watchdog/sparc64: Rename HAVE_NMI_WATCHDOG to 
> > HAVE_HARDLOCKUP_WATCHDOG_SPARC64
[...]
> > Also the variable is set only on sparc64. Move the definition
> > from arch/Kconfig to arch/sparc/Kconfig.debug.
> >
> > Signed-off-by: Petr Mladek 
> 
> I think you goofed up when squashing the patches. You've now got a
> second patch subject after your first Signed-off-by and then a second
> Signed-off-by... I assume everything after the first Signed-off-by
> should be dropped?

Ah, you are right. It seems that Andrew has fixed this when taking
the patch.

Thank you both,
Petr

[PATCH 04/17] powerpc/ftrace: Simplify function_graph support in ftrace.c

2023-06-19 Thread Naveen N Rao

Since we now support DYNAMIC_FTRACE_WITH_ARGS across ppc32 and ppc64
ELFv2, we can simplify function_graph tracer support code in ftrace.c

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 64 --
 1 file changed, 7 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 81a121b56c4d7f..f117124c30325f 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -790,44 +790,10 @@ int __init ftrace_dyn_arch_init(void)
 #endif
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-
-extern void ftrace_graph_call(void);
-extern void ftrace_graph_stub(void);
-
-static int ftrace_modify_ftrace_graph_caller(bool enable)
+void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
+  struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   unsigned long ip = (unsigned long)(_graph_call);
-   unsigned long addr = (unsigned long)(_graph_caller);
-   unsigned long stub = (unsigned long)(_graph_stub);
-   ppc_inst_t old, new;
-
-   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_ARGS))
-   return 0;
-
-   old = ftrace_call_replace(ip, enable ? stub : addr, 0);
-   new = ftrace_call_replace(ip, enable ? addr : stub, 0);
-
-   return ftrace_modify_code(ip, old, new);
-}
-
-int ftrace_enable_ftrace_graph_caller(void)
-{
-   return ftrace_modify_ftrace_graph_caller(true);
-}
-
-int ftrace_disable_ftrace_graph_caller(void)
-{
-   return ftrace_modify_ftrace_graph_caller(false);
-}
-
-/*
- * Hook the return address and push it in the stack of return addrs
- * in current thread info. Return the address we want to divert to.
- */
-static unsigned long
-__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long 
sp)
-{
-   unsigned long return_hooker;
+   unsigned long sp = fregs->regs.gpr[1];
int bit;
 
if (unlikely(ftrace_graph_is_dead()))
@@ -836,31 +802,15 @@ __prepare_ftrace_return(unsigned long parent, unsigned 
long ip, unsigned long sp
if (unlikely(atomic_read(>tracing_graph_pause)))
goto out;
 
-   bit = ftrace_test_recursion_trylock(ip, parent);
+   bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
goto out;
 
-   return_hooker = ppc_function_entry(return_to_handler);
-
-   if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
-   parent = return_hooker;
+   if (!function_graph_enter(parent_ip, ip, 0, (unsigned long *)sp))
+   parent_ip = ppc_function_entry(return_to_handler);
 
ftrace_test_recursion_unlock(bit);
 out:
-   return parent;
+   fregs->regs.link = parent_ip;
 }
-
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
-void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
-  struct ftrace_ops *op, struct ftrace_regs *fregs)
-{
-   fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, 
fregs->regs.gpr[1]);
-}
-#else
-unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
-   unsigned long sp)
-{
-   return __prepare_ftrace_return(parent, ip, sp);
-}
-#endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-- 
2.40.1

[PATCH 13/17] powerpc/ftrace: Simplify ftrace_modify_call()

2023-06-19 Thread Naveen N Rao

Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_modify_call() to patch-in the
updated branch instruction without worrying about the instructions
surrounding the ftrace location. Note that we continue to ensure we
have the expected branch instruction at the ftrace location before
patching it with the updated branch destination.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 161 -
 1 file changed, 21 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 6ea8b90246a540..c37e22c6c26521 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -89,33 +89,11 @@ static inline int ftrace_modify_code(unsigned long ip, 
ppc_inst_t old, ppc_inst_
return ret;
 }
 
-/*
- * Helper functions that are the same for both PPC64 and PPC32.
- */
-static int test_24bit_addr(unsigned long ip, unsigned long addr)
-{
-   addr = ppc_function_entry((void *)addr);
-
-   return is_offset_in_branch_range(addr - ip);
-}
-
 static int is_bl_op(ppc_inst_t op)
 {
return (ppc_inst_val(op) & ~PPC_LI_MASK) == PPC_RAW_BL(0);
 }
 
-static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op)
-{
-   int offset;
-
-   offset = PPC_LI(ppc_inst_val(op));
-   /* make it signed */
-   if (offset & 0x0200)
-   offset |= 0xfe00;
-
-   return ip + (long)offset;
-}
-
 static unsigned long find_ftrace_tramp(unsigned long ip)
 {
int i;
@@ -130,115 +108,16 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
 }
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-#ifdef CONFIG_MODULES
-static int
-__ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
-   unsigned long addr)
+int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, 
unsigned long addr)
 {
-   ppc_inst_t op;
-   unsigned long ip = rec->ip;
-   unsigned long entry, ptr, tramp;
-   struct module *mod = rec->arch.mod;
-
-   /* If we never set up ftrace trampolines, then bail */
-   if (!mod->arch.tramp || !mod->arch.tramp_regs) {
-   pr_err("No ftrace trampoline\n");
-   return -EINVAL;
-   }
-
-   /* read where this goes */
-   if (copy_inst_from_kernel_nofault(, (void *)ip)) {
-   pr_err("Fetching opcode failed.\n");
-   return -EFAULT;
-   }
-
-   /* Make sure that this is still a 24bit jump */
-   if (!is_bl_op(op)) {
-   pr_err("Not expected bl: opcode is %08lx\n", 
ppc_inst_as_ulong(op));
-   return -EINVAL;
-   }
-
-   /* lets find where the pointer goes */
-   tramp = find_bl_target(ip, op);
-   entry = ppc_global_function_entry((void *)old_addr);
-
-   pr_devel("ip:%lx jumps to %lx", ip, tramp);
-
-   if (tramp != entry) {
-   /* old_addr is not within range, so we must have used a 
trampoline */
-   if (module_trampoline_target(mod, tramp, )) {
-   pr_err("Failed to get trampoline target\n");
-   return -EFAULT;
-   }
-
-   pr_devel("trampoline target %lx", ptr);
-
-   /* This should match what was called */
-   if (ptr != entry) {
-   pr_err("addr %lx does not match expected %lx\n", ptr, 
entry);
-   return -EINVAL;
-   }
-   }
-
-   /* The new target may be within range */
-   if (test_24bit_addr(ip, addr)) {
-   /* within range */
-   if (patch_branch((u32 *)ip, addr, BRANCH_SET_LINK)) {
-   pr_err("REL24 out of range!\n");
-   return -EINVAL;
-   }
-
-   return 0;
-   }
-
-   if (rec->flags & FTRACE_FL_REGS)
-   tramp = mod->arch.tramp_regs;
-   else
-   tramp = mod->arch.tramp;
-
-   if (module_trampoline_target(mod, tramp, )) {
-   pr_err("Failed to get trampoline target\n");
-   return -EFAULT;
-   }
-
-   pr_devel("trampoline target %lx", ptr);
-
-   entry = ppc_global_function_entry((void *)addr);
-   /* This should match what was called */
-   if (ptr != entry) {
-   pr_err("addr %lx does not match expected %lx\n", ptr, entry);
-   return -EINVAL;
-   }
-
-   if (patch_branch((u32 *)ip, tramp, BRANCH_SET_LINK)) {
-   pr_err("REL24 out of range!\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-#else
-static int __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long 
old_addr, unsigned long addr)
-{
-   return 0;
-}
-#endif
-
-int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
-   unsigned long addr)
-{
-   unsigned long ip =

[PATCH 12/17] powerpc/ftrace: Simplify ftrace_make_call()

2023-06-19 Thread Naveen N Rao

Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_make_call() to replace the nop
without worrying about the instructions surrounding the ftrace location.
Note that we continue to ensure that we have a nop at the ftrace
location before patching it.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 187 +
 1 file changed, 31 insertions(+), 156 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 05153a1038fdff..6ea8b90246a540 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -129,162 +129,6 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
-#ifdef CONFIG_MODULES
-/*
- * Examine the existing instructions for __ftrace_make_call.
- * They should effectively be a NOP, and follow formal constraints,
- * depending on the ABI. Return false if they don't.
- */
-static bool expected_nop_sequence(void *ip, ppc_inst_t op0, ppc_inst_t op1)
-{
-   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
-   return ppc_inst_equal(op0, ppc_inst(PPC_RAW_NOP()));
-   else
-   return ppc_inst_equal(op0, ppc_inst(PPC_RAW_BRANCH(8))) &&
-  ppc_inst_equal(op1, ppc_inst(PPC_INST_LD_TOC));
-}
-
-static int
-__ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-   ppc_inst_t op[2];
-   void *ip = (void *)rec->ip;
-   unsigned long entry, ptr, tramp;
-   struct module *mod = rec->arch.mod;
-
-   /* read where this goes */
-   if (copy_inst_from_kernel_nofault(op, ip))
-   return -EFAULT;
-
-   if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
-   copy_inst_from_kernel_nofault(op + 1, ip + 4))
-   return -EFAULT;
-
-   if (!expected_nop_sequence(ip, op[0], op[1])) {
-   pr_err("Unexpected call sequence at %p: %08lx %08lx\n", ip,
-  ppc_inst_as_ulong(op[0]), ppc_inst_as_ulong(op[1]));
-   return -EINVAL;
-   }
-
-   /* If we never set up ftrace trampoline(s), then bail */
-   if (!mod->arch.tramp ||
-   (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && 
!mod->arch.tramp_regs)) {
-   pr_err("No ftrace trampoline\n");
-   return -EINVAL;
-   }
-
-   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && rec->flags & 
FTRACE_FL_REGS)
-   tramp = mod->arch.tramp_regs;
-   else
-   tramp = mod->arch.tramp;
-
-   if (module_trampoline_target(mod, tramp, )) {
-   pr_err("Failed to get trampoline target\n");
-   return -EFAULT;
-   }
-
-   pr_devel("trampoline target %lx", ptr);
-
-   entry = ppc_global_function_entry((void *)addr);
-   /* This should match what was called */
-   if (ptr != entry) {
-   pr_err("addr %lx does not match expected %lx\n", ptr, entry);
-   return -EINVAL;
-   }
-
-   if (patch_branch(ip, tramp, BRANCH_SET_LINK)) {
-   pr_err("REL24 out of range!\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-#else
-static int __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-   return 0;
-}
-#endif /* CONFIG_MODULES */
-
-static int __ftrace_make_call_kernel(struct dyn_ftrace *rec, unsigned long 
addr)
-{
-   ppc_inst_t op;
-   void *ip = (void *)rec->ip;
-   unsigned long tramp, entry, ptr;
-
-   /* Make sure we're being asked to patch branch to a known ftrace addr */
-   entry = ppc_global_function_entry((void *)ftrace_caller);
-   ptr = ppc_global_function_entry((void *)addr);
-
-   if (ptr != entry && IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
-   entry = ppc_global_function_entry((void *)ftrace_regs_caller);
-
-   if (ptr != entry) {
-   pr_err("Unknown ftrace addr to patch: %ps\n", (void *)ptr);
-   return -EINVAL;
-   }
-
-   /* Make sure we have a nop */
-   if (copy_inst_from_kernel_nofault(, ip)) {
-   pr_err("Unable to read ftrace location %p\n", ip);
-   return -EFAULT;
-   }
-
-   if (!ppc_inst_equal(op, ppc_inst(PPC_RAW_NOP( {
-   pr_err("Unexpected call sequence at %p: %08lx\n",
-  ip, ppc_inst_as_ulong(op));
-   return -EINVAL;
-   }
-
-   tramp = find_ftrace_tramp((unsigned long)ip);
-   if (!tramp) {
-   pr_err("No ftrace trampolines reachable from %ps\n", ip);
-   return -EINVAL;
-   }
-
-   if (patch_branch(ip, tramp, BRANCH_SET_LINK)) {
-   pr_err("Error patching branch to ftrace tramp!\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
-int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
-{
-   unsigned long ip = rec->ip;
-   ppc_inst_t old, new;
-
-

[PATCH 11/17] powerpc/ftrace: Simplify ftrace_make_nop()

2023-06-19 Thread Naveen N Rao

Now that we validate the ftrace location during initialization in
ftrace_init_nop(), we can simplify ftrace_make_nop() to patch-in the nop
without worrying about the instructions surrounding the ftrace location.
Note that we continue to ensure that we have a bl to
ftrace_[regs_]caller at the ftrace location before nop-ing it out.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 220 +
 1 file changed, 32 insertions(+), 188 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 98bd099c428ee0..05153a1038fdff 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -116,112 +116,6 @@ static unsigned long find_bl_target(unsigned long ip, 
ppc_inst_t op)
return ip + (long)offset;
 }
 
-#ifdef CONFIG_MODULES
-static int
-__ftrace_make_nop(struct module *mod,
- struct dyn_ftrace *rec, unsigned long addr)
-{
-   unsigned long entry, ptr, tramp;
-   unsigned long ip = rec->ip;
-   ppc_inst_t op, pop;
-
-   /* read where this goes */
-   if (copy_inst_from_kernel_nofault(, (void *)ip)) {
-   pr_err("Fetching opcode failed.\n");
-   return -EFAULT;
-   }
-
-   /* Make sure that this is still a 24bit jump */
-   if (!is_bl_op(op)) {
-   pr_err("Not expected bl: opcode is %08lx\n", 
ppc_inst_as_ulong(op));
-   return -EINVAL;
-   }
-
-   /* lets find where the pointer goes */
-   tramp = find_bl_target(ip, op);
-
-   pr_devel("ip:%lx jumps to %lx", ip, tramp);
-
-   if (module_trampoline_target(mod, tramp, )) {
-   pr_err("Failed to get trampoline target\n");
-   return -EFAULT;
-   }
-
-   pr_devel("trampoline target %lx", ptr);
-
-   entry = ppc_global_function_entry((void *)addr);
-   /* This should match what was called */
-   if (ptr != entry) {
-   pr_err("addr %lx does not match expected %lx\n", ptr, entry);
-   return -EINVAL;
-   }
-
-   if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
-   if (copy_inst_from_kernel_nofault(, (void *)(ip - 4))) {
-   pr_err("Fetching instruction at %lx failed.\n", ip - 4);
-   return -EFAULT;
-   }
-
-   /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
-   if (!ppc_inst_equal(op, ppc_inst(PPC_RAW_MFLR(_R0))) &&
-   !ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) {
-   pr_err("Unexpected instruction %08lx around bl 
_mcount\n",
-  ppc_inst_as_ulong(op));
-   return -EINVAL;
-   }
-   } else if (IS_ENABLED(CONFIG_PPC64)) {
-   /*
-* Check what is in the next instruction. We can see ld 
r2,40(r1), but
-* on first pass after boot we will see mflr r0.
-*/
-   if (copy_inst_from_kernel_nofault(, (void *)(ip + 4))) {
-   pr_err("Fetching op failed.\n");
-   return -EFAULT;
-   }
-
-   if (!ppc_inst_equal(op,  ppc_inst(PPC_INST_LD_TOC))) {
-   pr_err("Expected %08lx found %08lx\n", PPC_INST_LD_TOC,
-  ppc_inst_as_ulong(op));
-   return -EINVAL;
-   }
-   }
-
-   /*
-* When using -mprofile-kernel or PPC32 there is no load to jump over.
-*
-* Otherwise our original call site looks like:
-*
-* bl 
-* ld r2,XX(r1)
-*
-* Milton Miller pointed out that we can not simply nop the branch.
-* If a task was preempted when calling a trace function, the nops
-* will remove the way to restore the TOC in r2 and the r2 TOC will
-* get corrupted.
-*
-* Use a b +8 to jump over the load.
-* XXX: could make PCREL depend on MPROFILE_KERNEL
-* XXX: check PCREL && MPROFILE_KERNEL calling sequence
-*/
-   if (IS_ENABLED(CONFIG_MPROFILE_KERNEL) || IS_ENABLED(CONFIG_PPC32))
-   pop = ppc_inst(PPC_RAW_NOP());
-   else
-   pop = ppc_inst(PPC_RAW_BRANCH(8));  /* b +8 */
-
-   if (patch_instruction((u32 *)ip, pop)) {
-   pr_err("Patching NOP failed.\n");
-   return -EPERM;
-   }
-
-   return 0;
-}
-#else
-static int __ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, 
unsigned long addr)
-{
-   return 0;
-}
-#endif /* CONFIG_MODULES */
-
 static unsigned long find_ftrace_tramp(unsigned long ip)
 {
int i;
@@ -235,88 +129,6 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
-static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
-{
-   unsigned long tramp, ip = rec->ip;
-   ppc_inst_t op;
-
-

[PATCH 10/17] powerpc/ftrace: Add separate ftrace_init_nop() with additional validation

2023-06-19 Thread Naveen N Rao

Currently, we validate instructions around the ftrace location every
time we have to enable/disable ftrace. Introduce ftrace_init_nop() to
instead perform all the validation during ftrace initialization. This
allows us to simply patch the necessary instructions during
enabling/disabling ftrace.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h  |  6 +++
 arch/powerpc/kernel/trace/ftrace.c | 71 ++
 2 files changed, 77 insertions(+)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 702aaf2efa966c..ef9f0b97670d1c 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -29,11 +29,17 @@ static inline unsigned long ftrace_call_adjust(unsigned 
long addr)
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp);
 
+struct module;
+struct dyn_ftrace;
 struct dyn_arch_ftrace {
struct module *mod;
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
+#define ftrace_need_init_nop() (true)
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
+#define ftrace_init_nop ftrace_init_nop
+
 struct ftrace_regs {
struct pt_regs regs;
 };
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 278bf8e52b6e89..98bd099c428ee0 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -31,6 +31,16 @@
 #defineNUM_FTRACE_TRAMPS   2
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
+static ppc_inst_t ftrace_create_branch_inst(unsigned long ip, unsigned long 
addr, int link)
+{
+   ppc_inst_t op;
+
+   WARN_ON(!is_offset_in_branch_range(addr - ip));
+   create_branch(, (u32 *)ip, addr, link ? BRANCH_SET_LINK : 0);
+
+   return op;
+}
+
 static ppc_inst_t
 ftrace_call_replace(unsigned long ip, unsigned long addr, int link)
 {
@@ -597,6 +607,67 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned 
long old_addr,
 }
 #endif
 
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
+{
+   unsigned long addr, ip = rec->ip;
+   ppc_inst_t old, new;
+   int ret = 0;
+
+   /* Verify instructions surrounding the ftrace location */
+   if (IS_ENABLED(CONFIG_PPC32)) {
+   /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
+   ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
+   if (!ret)
+   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
+   /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
+   ret = ftrace_read_inst(ip - 4, );
+   if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
+   ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
+   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   }
+   } else {
+   return -EINVAL;
+   }
+
+   if (ret)
+   return ret;
+
+   if (!core_kernel_text(ip)) {
+   if (!mod) {
+   pr_err("0x%lx: No module provided for non-kernel 
address\n", ip);
+   return -EFAULT;
+   }
+   rec->arch.mod = mod;
+   }
+
+   /* Nop-out the ftrace location */
+   new = ppc_inst(PPC_RAW_NOP());
+   addr = MCOUNT_ADDR;
+   if (is_offset_in_branch_range(addr - ip)) {
+   /* Within range */
+   old = ftrace_create_branch_inst(ip, addr, 1);
+   ret = ftrace_modify_code(ip, old, new);
+   } else if (core_kernel_text(ip) || (IS_ENABLED(CONFIG_MODULES) && mod)) 
{
+   /*
+* We would be branching to a linker-generated stub, or to the 
module _mcount
+* stub. Let's just confirm we have a 'bl' here.
+*/
+   ret = ftrace_read_inst(ip, );
+   if (ret)
+   return ret;
+   if (!is_bl_op(old)) {
+   pr_err("0x%lx: expected (bl) != found (%08lx)\n", ip, 
ppc_inst_as_ulong(old));
+   return -EINVAL;
+   }
+   ret = patch_instruction((u32 *)ip, new);
+   } else {
+   return -EINVAL;
+   }
+
+   return ret;
+}
+
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
unsigned long ip = (unsigned long)(_call);
-- 
2.40.1

[PATCH 09/17] powerpc/ftrace: Stop re-purposing linker generated long branches for ftrace

2023-06-19 Thread Naveen N Rao

Commit 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
added ftrace support for ppc64 kernel images with a text section larger
than 32MB. The patch did two things:
1. Add stubs at the end of .text to branch into ftrace_[regs_]caller for
   functions that were out of branch range.
2. Re-purpose linker-generated long branches to _mcount to instead branch
   to ftrace_[regs_]caller.

Before that, we only supported kernel .text up to ~32MB. With the above,
we now support up to ~96MB:
- The first 32MB of kernel text can branch directly into
  ftrace_[regs_]caller since that symbol is usually at the beginning.
- The modified long_branch from (2) above is used by the next 32MB of
  kernel text.
- The next 32MB of kernel text can use the stub at the end of text to
  branch back to ftrace_[regs_]caller.

While re-purposing the long branch works in practice, it still restricts
ftrace to kernel text up to ~96MB. The stub at the end of kernel text
from (1) already enables us to extend ftrace support for kernel text
up to 64MB, which fulfils the original requirement. Further, once we
switch to -fpatchable-function-entry, there will not be a long branch
that we can use.

Stop re-purposing the linker-generated long branches for ftrace to
simplify the code. If there are good reasons to support ftrace on
kernels beyond 64MB, we can consider adding support by using
-fpatchable-function-entry.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 110 +
 1 file changed, 17 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index ef4e49c2c37781..278bf8e52b6e89 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -28,13 +28,7 @@
 #include 
 #include 
 
-/*
- * We generally only have a single long_branch tramp and at most 2 or 3 plt
- * tramps generated. But, we don't use the plt tramps currently. We also allot
- * 2 tramps after .text and .init.text. So, we only end up with around 3 usable
- * tramps in total. Set aside 8 just to be sure.
- */
-#defineNUM_FTRACE_TRAMPS   8
+#defineNUM_FTRACE_TRAMPS   2
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
 static ppc_inst_t
@@ -100,11 +94,6 @@ static int is_bl_op(ppc_inst_t op)
return (ppc_inst_val(op) & ~PPC_LI_MASK) == PPC_RAW_BL(0);
 }
 
-static int is_b_op(ppc_inst_t op)
-{
-   return (ppc_inst_val(op) & ~PPC_LI_MASK) == PPC_RAW_BRANCH(0);
-}
-
 static unsigned long find_bl_target(unsigned long ip, ppc_inst_t op)
 {
int offset;
@@ -227,11 +216,7 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
 {
int i;
 
-   /*
-* We have the compiler generated long_branch tramps at the end
-* and we prefer those
-*/
-   for (i = NUM_FTRACE_TRAMPS - 1; i >= 0; i--)
+   for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
if (!ftrace_tramps[i])
continue;
else if (is_offset_in_branch_range(ftrace_tramps[i] - ip))
@@ -240,75 +225,6 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
-static int add_ftrace_tramp(unsigned long tramp)
-{
-   int i;
-
-   for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
-   if (!ftrace_tramps[i]) {
-   ftrace_tramps[i] = tramp;
-   return 0;
-   }
-
-   return -1;
-}
-
-/*
- * If this is a compiler generated long_branch trampoline (essentially, a
- * trampoline that has a branch to _mcount()), we re-write the branch to
- * instead go to ftrace_[regs_]caller() and note down the location of this
- * trampoline.
- */
-static int setup_mcount_compiler_tramp(unsigned long tramp)
-{
-   int i;
-   ppc_inst_t op;
-   unsigned long ptr;
-
-   /* Is this a known long jump tramp? */
-   for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
-   if (ftrace_tramps[i] == tramp)
-   return 0;
-
-   /* New trampoline -- read where this goes */
-   if (copy_inst_from_kernel_nofault(, (void *)tramp)) {
-   pr_debug("Fetching opcode failed.\n");
-   return -1;
-   }
-
-   /* Is this a 24 bit branch? */
-   if (!is_b_op(op)) {
-   pr_debug("Trampoline is not a long branch tramp.\n");
-   return -1;
-   }
-
-   /* lets find where the pointer goes */
-   ptr = find_bl_target(tramp, op);
-
-   if (ptr != ppc_global_function_entry((void *)_mcount)) {
-   pr_debug("Trampoline target %p is not _mcount\n", (void *)ptr);
-   return -1;
-   }
-
-   /* Let's re-write the tramp to go to ftrace_[regs_]caller */
-   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
-   ptr = ppc_global_function_entry((void *)ftrace_regs_caller);
-   else
-   ptr = ppc_global_function_entry((void *)ftrace_caller);
-
-   if

[PATCH 08/17] powerpc/ftrace: Refactor ftrace_modify_code()

2023-06-19 Thread Naveen N Rao

Split up ftrace_modify_code() into a few helpers for future use. Also
update error messages accordingly.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 51 +-
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 913c7aa63d3fa3..ef4e49c2c37781 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -50,32 +50,39 @@ ftrace_call_replace(unsigned long ip, unsigned long addr, 
int link)
return op;
 }
 
-static inline int
-ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_t new)
+static inline int ftrace_read_inst(unsigned long ip, ppc_inst_t *op)
 {
-   ppc_inst_t replaced;
-
-   /*
-* Note:
-* We are paranoid about modifying text, as if a bug was to happen, it
-* could cause us to read or write to someplace that could cause harm.
-* Carefully read and modify the code with probe_kernel_*(), and make
-* sure what we read is what we expected it to be before modifying it.
-*/
-
-   /* read the text we want to modify */
-   if (copy_inst_from_kernel_nofault(, (void *)ip))
+   if (copy_inst_from_kernel_nofault(op, (void *)ip)) {
+   pr_err("0x%lx: fetching instruction failed\n", ip);
return -EFAULT;
-
-   /* Make sure it is what we expect it to be */
-   if (!ppc_inst_equal(replaced, old)) {
-   pr_err("%p: replaced (%08lx) != old (%08lx)", (void *)ip,
-  ppc_inst_as_ulong(replaced), ppc_inst_as_ulong(old));
-   return -EINVAL;
}
 
-   /* replace the text with the new text */
-   return patch_instruction((u32 *)ip, new);
+   return 0;
+}
+
+static inline int ftrace_validate_inst(unsigned long ip, ppc_inst_t inst)
+{
+   ppc_inst_t op;
+   int ret;
+
+   ret = ftrace_read_inst(ip, );
+   if (!ret && !ppc_inst_equal(op, inst)) {
+   pr_err("0x%lx: expected (%08lx) != found (%08lx)\n",
+  ip, ppc_inst_as_ulong(inst), ppc_inst_as_ulong(op));
+   ret = -EINVAL;
+   }
+
+   return ret;
+}
+
+static inline int ftrace_modify_code(unsigned long ip, ppc_inst_t old, 
ppc_inst_t new)
+{
+   int ret = ftrace_validate_inst(ip, old);
+
+   if (!ret)
+   ret = patch_instruction((u32 *)ip, new);
+
+   return ret;
 }
 
 /*
-- 
2.40.1

[PATCH 07/17] powerpc/ftrace: Consolidate ftrace support into fewer files

2023-06-19 Thread Naveen N Rao

ftrace_low.S has just the _mcount stub and return_to_handler(). Merge
this back into ftrace_mprofile.S and ftrace_64_pg.S to keep all ftrace
code together, and to allow those to evolve independently.

ftrace_mprofile.S is also not an entirely accurate name since this also
holds ppc32 code. This will be all the more incorrect once support for
-fpatchable-function-entry is added. Rename files here to more
accurately describe the code:
- ftrace_mprofile.S is renamed to ftrace_entry.S
- ftrace_pg.c is renamed to ftrace_64_pg.c
- ftrace_64_pg.S is rename to ftrace_64_pg_entry.S

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/Makefile| 17 +++--
 arch/powerpc/kernel/trace/ftrace_64_pg.S  | 67 ---
 .../trace/{ftrace_pg.c => ftrace_64_pg.c} |  0
 .../{ftrace_low.S => ftrace_64_pg_entry.S}| 58 +++-
 .../{ftrace_mprofile.S => ftrace_entry.S} | 65 ++
 5 files changed, 130 insertions(+), 77 deletions(-)
 delete mode 100644 arch/powerpc/kernel/trace/ftrace_64_pg.S
 rename arch/powerpc/kernel/trace/{ftrace_pg.c => ftrace_64_pg.c} (100%)
 rename arch/powerpc/kernel/trace/{ftrace_low.S => ftrace_64_pg_entry.S} (55%)
 rename arch/powerpc/kernel/trace/{ftrace_mprofile.S => ftrace_entry.S} (83%)

diff --git a/arch/powerpc/kernel/trace/Makefile 
b/arch/powerpc/kernel/trace/Makefile
index 342a2d1ae86cd0..125f4ca588b98a 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -6,16 +6,15 @@
 ifdef CONFIG_FUNCTION_TRACER
 # do not trace tracer code
 CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
-CFLAGS_REMOVE_ftrace_pg.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_ftrace_64_pg.o = $(CC_FLAGS_FTRACE)
 endif
 
-obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o ftrace.o
+obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace.o ftrace_entry.o
 ifdef CONFIG_MPROFILE_KERNEL
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o ftrace.o
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace.o ftrace_entry.o
 else
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o ftrace_pg.o
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o 
ftrace_64_pg_entry.o
 endif
-obj-$(CONFIG_FUNCTION_TRACER)  += ftrace_low.o
 obj-$(CONFIG_TRACING)  += trace_clock.o
 
 obj-$(CONFIG_PPC64)+= $(obj64-y)
@@ -26,7 +25,7 @@ GCOV_PROFILE_ftrace.o := n
 KCOV_INSTRUMENT_ftrace.o := n
 KCSAN_SANITIZE_ftrace.o := n
 UBSAN_SANITIZE_ftrace.o := n
-GCOV_PROFILE_ftrace_pg.o := n
-KCOV_INSTRUMENT_ftrace_pg.o := n
-KCSAN_SANITIZE_ftrace_pg.o := n
-UBSAN_SANITIZE_ftrace_pg.o := n
+GCOV_PROFILE_ftrace_64_pg.o := n
+KCOV_INSTRUMENT_ftrace_64_pg.o := n
+KCSAN_SANITIZE_ftrace_64_pg.o := n
+UBSAN_SANITIZE_ftrace_64_pg.o := n
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.S 
b/arch/powerpc/kernel/trace/ftrace_64_pg.S
deleted file mode 100644
index 6708e24db0aba8..00
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.S
+++ /dev/null
@@ -1,67 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Split from ftrace_64.S
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-_GLOBAL_TOC(ftrace_caller)
-   lbz r3, PACA_FTRACE_ENABLED(r13)
-   cmpdi   r3, 0
-   beqlr
-
-   /* Taken from output of objdump from lib64/glibc */
-   mflrr3
-   ld  r11, 0(r1)
-   stdur1, -112(r1)
-   std r3, 128(r1)
-   ld  r4, 16(r11)
-   subir3, r3, MCOUNT_INSN_SIZE
-.globl ftrace_call
-ftrace_call:
-   bl  ftrace_stub
-   nop
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-.globl ftrace_graph_call
-ftrace_graph_call:
-   b   ftrace_graph_stub
-_GLOBAL(ftrace_graph_stub)
-#endif
-   ld  r0, 128(r1)
-   mtlrr0
-   addir1, r1, 112
-
-_GLOBAL(ftrace_stub)
-   blr
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-_GLOBAL(ftrace_graph_caller)
-   addir5, r1, 112
-   /* load r4 with local address */
-   ld  r4, 128(r1)
-   subir4, r4, MCOUNT_INSN_SIZE
-
-   /* Grab the LR out of the caller stack frame */
-   ld  r11, 112(r1)
-   ld  r3, 16(r11)
-
-   bl  prepare_ftrace_return
-   nop
-
-   /*
-* prepare_ftrace_return gives us the address we divert to.
-* Change the LR in the callers stack frame to this.
-*/
-   ld  r11, 112(r1)
-   std r3, 16(r11)
-
-   ld  r0, 128(r1)
-   mtlrr0
-   addir1, r1, 112
-   blr
-#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/powerpc/kernel/trace/ftrace_pg.c 
b/arch/powerpc/kernel/trace/ftrace_64_pg.c
similarity index 100%
rename from arch/powerpc/kernel/trace/ftrace_pg.c
rename to arch/powerpc/kernel/trace/ftrace_64_pg.c
diff --git a/arch/powerpc/kernel/trace/ftrace_low.S 
b/arch/powerpc/kernel/trace/ftrace_64_pg_entry.S
similarity index 55%
rename from arch/powerpc/kernel/trace/ftrace_low.S

[PATCH 06/17] powerpc/ftrace: Extend ftrace support for large kernels to ppc32

2023-06-19 Thread Naveen N Rao

Commit 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
added ftrace support for ppc64 kernel images with a text section larger
than 32MB. The approach itself isn't specific to ppc64, so extend the
same to also work on ppc32.

While at it, reduce the space reserved for the stub from 64 bytes to 32
bytes since the different stub variants are all less than 8
instructions.

To reduce use of #ifdef, a stub implementation is provided for
kernel_toc_address() and -SZ_2G is cast to 'long long' to prevent
errors on ppc32.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h  | 10 +--
 arch/powerpc/include/asm/sections.h|  2 ++
 arch/powerpc/kernel/trace/ftrace.c | 39 ++
 arch/powerpc/kernel/trace/ftrace_low.S |  6 ++--
 arch/powerpc/kernel/vmlinux.lds.S  |  4 ---
 5 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 2edc6269b1a357..702aaf2efa966c 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -124,15 +124,19 @@ static inline u8 this_cpu_get_ftrace_enabled(void)
 {
return get_paca()->ftrace_enabled;
 }
-
-void ftrace_free_init_tramp(void);
 #else /* CONFIG_PPC64 */
 static inline void this_cpu_disable_ftrace(void) { }
 static inline void this_cpu_enable_ftrace(void) { }
 static inline void this_cpu_set_ftrace_enabled(u8 ftrace_enabled) { }
 static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
-static inline void ftrace_free_init_tramp(void) { }
 #endif /* CONFIG_PPC64 */
+
+#ifdef CONFIG_FUNCTION_TRACER
+extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
+void ftrace_free_init_tramp(void);
+#else
+static inline void ftrace_free_init_tramp(void) { }
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_FTRACE */
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 4e1f548c8d373d..ea26665f82cfc8 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -74,6 +74,8 @@ static inline int overlaps_kernel_text(unsigned long start, 
unsigned long end)
(unsigned long)_stext < end;
 }
 
+#else
+static inline unsigned long kernel_toc_addr(void) { BUILD_BUG(); return -1UL; }
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 5aa36272617a03..913c7aa63d3fa3 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -707,11 +707,6 @@ void arch_ftrace_update_code(int command)
ftrace_modify_all_code(command);
 }
 
-#ifdef CONFIG_PPC64
-#define PACATOC offsetof(struct paca_struct, kernel_toc)
-
-extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
-
 void ftrace_free_init_tramp(void)
 {
int i;
@@ -725,28 +720,30 @@ void ftrace_free_init_tramp(void)
 
 int __init ftrace_dyn_arch_init(void)
 {
-   int i;
unsigned int *tramp[] = { ftrace_tramp_text, ftrace_tramp_init };
-#ifdef CONFIG_PPC_KERNEL_PCREL
+   unsigned long addr = FTRACE_REGS_ADDR;
+   long reladdr;
+   int i;
u32 stub_insns[] = {
+#ifdef CONFIG_PPC_KERNEL_PCREL
/* pla r12,addr */
PPC_PREFIX_MLS | __PPC_PRFX_R(1),
PPC_INST_PADDI | ___PPC_RT(_R12),
PPC_RAW_MTCTR(_R12),
PPC_RAW_BCTR()
-   };
-#else
-   u32 stub_insns[] = {
-   PPC_RAW_LD(_R12, _R13, PACATOC),
+#elif defined(CONFIG_PPC64)
+   PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
kernel_toc)),
PPC_RAW_ADDIS(_R12, _R12, 0),
PPC_RAW_ADDI(_R12, _R12, 0),
PPC_RAW_MTCTR(_R12),
PPC_RAW_BCTR()
-   };
+#else
+   PPC_RAW_LIS(_R12, 0),
+   PPC_RAW_ADDI(_R12, _R12, 0),
+   PPC_RAW_MTCTR(_R12),
+   PPC_RAW_BCTR()
 #endif
-
-   unsigned long addr = FTRACE_REGS_ADDR;
-   long reladdr;
+   };
 
if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
for (i = 0; i < 2; i++) {
@@ -763,10 +760,10 @@ int __init ftrace_dyn_arch_init(void)
tramp[i][1] |= IMM_L(reladdr);
add_ftrace_tramp((unsigned long)tramp[i]);
}
-   } else {
+   } else if (IS_ENABLED(CONFIG_PPC64)) {
reladdr = addr - kernel_toc_addr();
 
-   if (reladdr >= (long)SZ_2G || reladdr < -(long)SZ_2G) {
+   if (reladdr >= (long)SZ_2G || reladdr < -(long long)SZ_2G) {
pr_err("Address of %ps out of range of kernel_toc.\n",
(void *)addr);
return -1;
@@ -778,11 +775,17 @@ int __init ftrace_dyn_arch_init(void)
tramp[i][2] |= PPC_LO(reladdr);
add_ftrace_tramp((unsigned long)tramp[i]);

[PATCH 05/17] powerpc/ftrace: Use FTRACE_REGS_ADDR to identify the correct ftrace trampoline

2023-06-19 Thread Naveen N Rao

Instead of keying off DYNAMIC_FTRACE_WITH_REGS, use FTRACE_REGS_ADDR to
identify the proper ftrace trampoline address to use.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index f117124c30325f..5aa36272617a03 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -745,14 +745,9 @@ int __init ftrace_dyn_arch_init(void)
};
 #endif
 
-   unsigned long addr;
+   unsigned long addr = FTRACE_REGS_ADDR;
long reladdr;
 
-   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
-   addr = ppc_global_function_entry((void *)ftrace_regs_caller);
-   else
-   addr = ppc_global_function_entry((void *)ftrace_caller);
-
if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
for (i = 0; i < 2; i++) {
reladdr = addr - (unsigned long)tramp[i];
-- 
2.40.1

[PATCH 17/17] powerpc/ftrace: Create a dummy stackframe to fix stack unwind

2023-06-19 Thread Naveen N Rao

With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:

/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
DepthSize   Location(17 entries)
-   
  0) 4144  32   ftrace_call+0x4/0x44
  1) 4112 432   get_page_from_freelist+0x26c/0x1ad0
  2) 3680 496   __alloc_pages+0x290/0x1280
  3) 3184 336   __folio_alloc+0x34/0x90
  4) 2848 176   vma_alloc_folio+0xd8/0x540
  5) 2672 272   __handle_mm_fault+0x700/0x1cc0
  6) 2400 208   handle_mm_fault+0xf0/0x3f0
  7) 2192  80   ___do_page_fault+0x3e4/0xbe0
  8) 2112 160   do_page_fault+0x30/0xc0
  9) 1952 256   data_access_common_virt+0x210/0x220
 10) 1696 400   0xcf16b100
 11) 1296 384   load_elf_binary+0x804/0x1b80
 12)  912 208   bprm_execve+0x2d8/0x7e0
 13)  704  64   do_execveat_common+0x1d0/0x2f0
 14)  640 160   sys_execve+0x54/0x70
 15)  480  64   system_call_exception+0x138/0x350
 16)  416 416   system_call_common+0x160/0x2c4

Fix this by having ftrace create a dummy stackframe for the function
being traced. Since this is only relevant when ftrace is active, we nop
out the instruction to store LR in the LR save area in the profiling
instruction sequence on ppc32 (and in ppc64 with older gcc versions).
Instead, in ftrace, we store LR in the LR save area of the previous
stackframe, and create a minimal stackframe to represent the function
being traced. With this, backtraces now capture the function being
traced:

/sys/kernel/debug/tracing # cat stack_trace
DepthSize   Location(17 entries)
-   
  0) 3888  32   _raw_spin_trylock+0x8/0x70
  1) 3856 576   get_page_from_freelist+0x26c/0x1ad0
  2) 3280  64   __alloc_pages+0x290/0x1280
  3) 3216 336   __folio_alloc+0x34/0x90
  4) 2880 176   vma_alloc_folio+0xd8/0x540
  5) 2704 416   __handle_mm_fault+0x700/0x1cc0
  6) 2288  96   handle_mm_fault+0xf0/0x3f0
  7) 2192  48   ___do_page_fault+0x3e4/0xbe0
  8) 2144 192   do_page_fault+0x30/0xc0
  9) 1952 608   data_access_common_virt+0x210/0x220
 10) 1344  16   0xc000334bbb50
 11) 1328 416   load_elf_binary+0x804/0x1b80
 12)  912  64   bprm_execve+0x2d8/0x7e0
 13)  848 176   do_execveat_common+0x1d0/0x2f0
 14)  672 192   sys_execve+0x54/0x70
 15)  480  64   system_call_exception+0x138/0x350
 16)  416 416   system_call_common+0x160/0x2c4

This results in two additional stores in the ftrace entry code, but
produces reliable backtraces. Note that this change now aligns with
other architectures (arm64, s390, x86).

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   |  6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 11 ---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf887c..2956196c98ffdc 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -229,13 +229,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, );
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   ret |= ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index bab3ab1368a33f..05e981fb526c2e 100644
---

[PATCH 16/17] powerpc/ftrace: Add support for -fpatchable-function-entry

2023-06-19 Thread Naveen N Rao

GCC v13.1 updated support for -fpatchable-function-entry on ppc64le to
emit nops after the local entry point, rather than before it. This
allows us to use this in the kernel for ftrace purposes. A new script is
added under arch/powerpc/tools/ to help detect if nops are emitted after
the function local entry point, or before the global entry point.

With -fpatchable-function-entry, we no longer have the profiling
instructions generated at function entry, so we only need to validate
the presence of two nops at the ftrace location in ftrace_init_nop(). We
patch the preceding instruction with 'mflr r0' to match the
-mprofile-kernel ABI for subsequent ftrace use.

This changes the profiling instructions used on ppc32. The default -pg
option emits an additional 'stw' instruction after 'mflr r0' and before
the branch to _mcount 'bl _mcount'. This is very similar to the original
-mprofile-kernel implementation on ppc64le, where an additional 'std'
instruction was used to save LR to its save location in the caller's
stackframe. Subsequently, this additional store was removed in later
compiler versions for performance reasons. The same reasons apply for
ppc32 so we only patch in a 'mflr r0'.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig  | 14 +++---
 arch/powerpc/Makefile |  5 
 arch/powerpc/include/asm/ftrace.h |  6 +++--
 arch/powerpc/include/asm/vermagic.h   |  4 ++-
 arch/powerpc/kernel/module_64.c   |  2 +-
 arch/powerpc/kernel/trace/ftrace.c| 14 --
 arch/powerpc/kernel/trace/ftrace_entry.S  |  2 ++
 .../gcc-check-fpatchable-function-entry.sh| 26 +++
 8 files changed, 64 insertions(+), 9 deletions(-)
 create mode 100755 arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bff5820b7cda14..9352d8e68152e1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -187,6 +187,7 @@ config PPC
select DYNAMIC_FTRACE   if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
+   select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
@@ -227,8 +228,8 @@ config PPC
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
-   select HAVE_DYNAMIC_FTRACE_WITH_ARGSif MPROFILE_KERNEL || PPC32
-   select HAVE_DYNAMIC_FTRACE_WITH_REGSif MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
@@ -256,7 +257,7 @@ config PPC
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && 
PPC_BOOK3S)
select HAVE_OPTPROBES
-   select HAVE_OBJTOOL if PPC32 || MPROFILE_KERNEL
+   select HAVE_OBJTOOL if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_OBJTOOL_MCOUNT  if HAVE_OBJTOOL
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
@@ -550,6 +551,13 @@ config MPROFILE_KERNEL
depends on PPC64 && CPU_LITTLE_ENDIAN && FUNCTION_TRACER
def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-mprofile-kernel.sh $(CC) 
-I$(srctree)/include -D__KERNEL__)
 
+config ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   depends on FUNCTION_TRACER && (PPC32 || PPC64_ELF_ABI_V2)
+   depends on $(cc-option,-fpatchable-function-entry=2)
+   def_bool y if PPC32
+   def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mlittle-endian) if PPC64 && CPU_LITTLE_ENDIAN
+   def_bool 
$(success,$(srctree)/arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh 
$(CC) -mbig-endian) if PPC64 && CPU_BIG_ENDIAN
+
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index dca73f673d7046..de39478b1c9e9f 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -148,11 +148,16 @@ CFLAGS-$(CONFIG_PPC32)+= $(call cc-option, 
$(MULTIPLEWORD))
 CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)
 
 ifdef CONFIG_FUNCTION_TRACER
+ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL

[PATCH 15/17] powerpc/ftrace: Implement ftrace_replace_code()

2023-06-19 Thread Naveen N Rao

Implement ftrace_replace_code() to consolidate logic from the different
ftrace patching routines: ftrace_make_nop(), ftrace_make_call() and
ftrace_modify_call(). Note that ftrace_make_call() is still required
primarily to handle patching modules during their load time. The other
two routines should no longer be called.

This lays the groundwork to enable better control in patching ftrace
locations, including the ability to nop-out preceding profiling
instructions when ftrace is disabled.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 173 -
 1 file changed, 96 insertions(+), 77 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 422dd760fbe013..cf9dce77527920 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -94,104 +94,123 @@ static unsigned long find_ftrace_tramp(unsigned long ip)
return 0;
 }
 
+static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, 
ppc_inst_t *call_inst)
+{
+   unsigned long ip = rec->ip;
+   unsigned long stub;
+
+   if (is_offset_in_branch_range(addr - ip)) {
+   /* Within range */
+   stub = addr;
+#ifdef CONFIG_MODULES
+   } else if (rec->arch.mod) {
+   /* Module code would be going to one of the module stubs */
+   stub = (addr == (unsigned long)ftrace_caller ? 
rec->arch.mod->arch.tramp :
+  
rec->arch.mod->arch.tramp_regs);
+#endif
+   } else if (core_kernel_text(ip)) {
+   /* We would be branching to one of our ftrace stubs */
+   stub = find_ftrace_tramp(ip);
+   if (!stub) {
+   pr_err("0x%lx: No ftrace stubs reachable\n", ip);
+   return -EINVAL;
+   }
+   } else {
+   return -EINVAL;
+   }
+
+   *call_inst = ftrace_create_branch_inst(ip, stub, 1);
+   return 0;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, 
unsigned long addr)
 {
-   unsigned long tramp, tramp_old, ip = rec->ip;
-   ppc_inst_t old, new;
-   struct module *mod;
-
-   if (is_offset_in_branch_range(old_addr - ip) && 
is_offset_in_branch_range(addr - ip)) {
-   /* Within range */
-   old = ftrace_create_branch_inst(ip, old_addr, 1);
-   new = ftrace_create_branch_inst(ip, addr, 1);
-   return ftrace_modify_code(ip, old, new);
-   } else if (core_kernel_text(ip)) {
-   /*
-* We always patch out of range locations to go to the regs
-* variant, so there is nothing to do here
-*/
-   return 0;
-   } else if (IS_ENABLED(CONFIG_MODULES)) {
-   /* Module code would be going to one of the module stubs */
-   mod = rec->arch.mod;
-   if (addr == (unsigned long)ftrace_caller) {
-   tramp_old = mod->arch.tramp_regs;
-   tramp = mod->arch.tramp;
-   } else {
-   tramp_old = mod->arch.tramp;
-   tramp = mod->arch.tramp_regs;
-   }
-   old = ftrace_create_branch_inst(ip, tramp_old, 1);
-   new = ftrace_create_branch_inst(ip, tramp, 1);
-   return ftrace_modify_code(ip, old, new);
-   }
-
+   /* This should never be called since we override ftrace_replace_code() 
*/
+   WARN_ON(1);
return -EINVAL;
 }
 #endif
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
-   unsigned long tramp, ip = rec->ip;
ppc_inst_t old, new;
-   struct module *mod;
+   int ret;
+
+   /* This can only ever be called during module load */
+   if (WARN_ON(!IS_ENABLED(CONFIG_MODULES) || core_kernel_text(rec->ip)))
+   return -EINVAL;
 
old = ppc_inst(PPC_RAW_NOP());
-   if (is_offset_in_branch_range(addr - ip)) {
-   /* Within range */
-   new = ftrace_create_branch_inst(ip, addr, 1);
-   return ftrace_modify_code(ip, old, new);
-   } else if (core_kernel_text(ip)) {
-   /* We would be branching to one of our ftrace tramps */
-   tramp = find_ftrace_tramp(ip);
-   if (!tramp) {
-   pr_err("0x%lx: No ftrace trampolines reachable\n", ip);
-   return -EINVAL;
-   }
-   new = ftrace_create_branch_inst(ip, tramp, 1);
-   return ftrace_modify_code(ip, old, new);
-   } else if (IS_ENABLED(CONFIG_MODULES)) {
-   /* Module code would be going to one of the module stubs */
-   mod = rec->arch.mod;
-   tramp = (addr == (unsigned long)ftrace_caller ? mod->arch.tramp 
:

[PATCH 14/17] powerpc/ftrace: Replace use of ftrace_call_replace() with ftrace_create_branch_inst()

2023-06-19 Thread Naveen N Rao

ftrace_create_branch_inst() is clearer about its intent than
ftrace_call_replace().

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index c37e22c6c26521..422dd760fbe013 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -41,19 +41,6 @@ static ppc_inst_t ftrace_create_branch_inst(unsigned long 
ip, unsigned long addr
return op;
 }
 
-static ppc_inst_t
-ftrace_call_replace(unsigned long ip, unsigned long addr, int link)
-{
-   ppc_inst_t op;
-
-   addr = ppc_function_entry((void *)addr);
-
-   /* if (link) set op to 'bl' else 'b' */
-   create_branch(, (u32 *)ip, addr, link ? BRANCH_SET_LINK : 0);
-
-   return op;
-}
-
 static inline int ftrace_read_inst(unsigned long ip, ppc_inst_t *op)
 {
if (copy_inst_from_kernel_nofault(op, (void *)ip)) {
@@ -275,14 +262,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
int ret;
 
old = ppc_inst_read((u32 *)_call);
-   new = ftrace_call_replace(ip, (unsigned long)func, 1);
+   new = ftrace_create_branch_inst(ip, ppc_function_entry(func), 1);
ret = ftrace_modify_code(ip, old, new);
 
/* Also update the regs callback function */
if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && !ret) {
ip = (unsigned long)(_regs_call);
old = ppc_inst_read((u32 *)_regs_call);
-   new = ftrace_call_replace(ip, (unsigned long)func, 1);
+   new = ftrace_create_branch_inst(ip, ppc_function_entry(func), 
1);
ret = ftrace_modify_code(ip, old, new);
}
 
-- 
2.40.1

[PATCH 01/17] powerpc/ftrace: Fix dropping weak symbols with older toolchains

2023-06-19 Thread Naveen N Rao

The minimum level of gcc supported for building the kernel is v5.1.
v5.x releases of gcc emitted a three instruction sequence for
-mprofile-kernel:
mflrr0
std r0, 16(r1)
bl  _mcount

It is only with the v6.x releases that gcc started emitting the two
instruction sequence for -mprofile-kernel, omitting the second store
instruction.

With the older three instruction sequence, the actual ftrace location
can be the 5th instruction into a function. Update the allowed offset
for ftrace location from 12 to 16 to accommodate the same.

Cc: sta...@vger.kernel.org
Fixes: 7af82ff90a2b06 ("powerpc/ftrace: Ignore weak functions")
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 91c049d51d0e10..2edc6269b1a357 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -12,7 +12,7 @@
 
 /* Ignore unused weak functions which will have larger offsets */
 #ifdef CONFIG_MPROFILE_KERNEL
-#define FTRACE_MCOUNT_MAX_OFFSET   12
+#define FTRACE_MCOUNT_MAX_OFFSET   16
 #elif defined(CONFIG_PPC32)
 #define FTRACE_MCOUNT_MAX_OFFSET   8
 #endif
-- 
2.40.1

[PATCH 03/17] powerpc64/ftrace: Move ELFv1 and -pg support code into a separate file

2023-06-19 Thread Naveen N Rao

ELFv1 support is deprecated and on the way out. Pre -mprofile-kernel
ftrace support (-pg only) is very limited and is retained primarily for
clang builds. It won't be necessary once clang lands support for
-fpatchable-function-entry.

Copy the existing ftrace code supporting these into ftrace_pg.c.
ftrace.c can then be refactored and enhanced with a focus on ppc32 and
ppc64 ELFv2.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/Makefile|  13 +-
 arch/powerpc/kernel/trace/ftrace.c|  10 -
 arch/powerpc/kernel/trace/ftrace_pg.c | 846 ++
 3 files changed, 855 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/kernel/trace/ftrace_pg.c

diff --git a/arch/powerpc/kernel/trace/Makefile 
b/arch/powerpc/kernel/trace/Makefile
index b16a9f9c0b35f2..342a2d1ae86cd0 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -6,15 +6,16 @@
 ifdef CONFIG_FUNCTION_TRACER
 # do not trace tracer code
 CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_ftrace_pg.o = $(CC_FLAGS_FTRACE)
 endif
 
-obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o
+obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o ftrace.o
 ifdef CONFIG_MPROFILE_KERNEL
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_mprofile.o ftrace.o
 else
-obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o ftrace_pg.o
 endif
-obj-$(CONFIG_FUNCTION_TRACER)  += ftrace_low.o ftrace.o
+obj-$(CONFIG_FUNCTION_TRACER)  += ftrace_low.o
 obj-$(CONFIG_TRACING)  += trace_clock.o
 
 obj-$(CONFIG_PPC64)+= $(obj64-y)
@@ -25,3 +26,7 @@ GCOV_PROFILE_ftrace.o := n
 KCOV_INSTRUMENT_ftrace.o := n
 KCSAN_SANITIZE_ftrace.o := n
 UBSAN_SANITIZE_ftrace.o := n
+GCOV_PROFILE_ftrace_pg.o := n
+KCOV_INSTRUMENT_ftrace_pg.o := n
+KCSAN_SANITIZE_ftrace_pg.o := n
+UBSAN_SANITIZE_ftrace_pg.o := n
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index a47f303734233b..81a121b56c4d7f 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -864,13 +864,3 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
 }
 #endif
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-
-#ifdef CONFIG_PPC64_ELF_ABI_V1
-char *arch_ftrace_match_adjust(char *str, const char *search)
-{
-   if (str[0] == '.' && search[0] != '.')
-   return str + 1;
-   else
-   return str;
-}
-#endif /* CONFIG_PPC64_ELF_ABI_V1 */
diff --git a/arch/powerpc/kernel/trace/ftrace_pg.c 
b/arch/powerpc/kernel/trace/ftrace_pg.c
new file mode 100644
index 00..7b85c3b460a3c0
--- /dev/null
+++ b/arch/powerpc/kernel/trace/ftrace_pg.c
@@ -0,0 +1,846 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Code for replacing ftrace calls with jumps.
+ *
+ * Copyright (C) 2007-2008 Steven Rostedt 
+ *
+ * Thanks goes out to P.A. Semi, Inc for supplying me with a PPC64 box.
+ *
+ * Added function graph tracer code, taken from x86 that was written
+ * by Frederic Weisbecker, and ported to PPC by Steven Rostedt.
+ *
+ */
+
+#define pr_fmt(fmt) "ftrace-powerpc: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * We generally only have a single long_branch tramp and at most 2 or 3 plt
+ * tramps generated. But, we don't use the plt tramps currently. We also allot
+ * 2 tramps after .text and .init.text. So, we only end up with around 3 usable
+ * tramps in total. Set aside 8 just to be sure.
+ */
+#defineNUM_FTRACE_TRAMPS   8
+static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
+
+static ppc_inst_t
+ftrace_call_replace(unsigned long ip, unsigned long addr, int link)
+{
+   ppc_inst_t op;
+
+   addr = ppc_function_entry((void *)addr);
+
+   /* if (link) set op to 'bl' else 'b' */
+   create_branch(, (u32 *)ip, addr, link ? BRANCH_SET_LINK : 0);
+
+   return op;
+}
+
+static inline int
+ftrace_modify_code(unsigned long ip, ppc_inst_t old, ppc_inst_t new)
+{
+   ppc_inst_t replaced;
+
+   /*
+* Note:
+* We are paranoid about modifying text, as if a bug was to happen, it
+* could cause us to read or write to someplace that could cause harm.
+* Carefully read and modify the code with probe_kernel_*(), and make
+* sure what we read is what we expected it to be before modifying it.
+*/
+
+   /* read the text we want to modify */
+   if (copy_inst_from_kernel_nofault(, (void *)ip))
+   return -EFAULT;
+
+   /* Make sure it is what we expect it to be */
+   if (!ppc_inst_equal(replaced, old)) {
+   pr_err("%p: replaced (%08lx) != old (%08lx)", (void *)ip,
+

[PATCH 02/17] powerpc/module: Remove unused .ftrace.tramp section

2023-06-19 Thread Naveen N Rao

.ftrace.tramp section is not used for any purpose. This code was added
all the way back in the original commit introducing support for dynamic
ftrace on ppc64 modules. Remove it.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/module.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index ac53606c259430..a8e2e8339fb7f4 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -75,10 +75,6 @@ struct mod_arch_specific {
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-#ifdef MODULE
-   asm(".section .ftrace.tramp,\"ax\",@nobits; .align 3; .previous");
-#endif /* MODULE */
-
 int module_trampoline_target(struct module *mod, unsigned long trampoline,
 unsigned long *target);
 int module_finalize_ftrace(struct module *mod, const Elf_Shdr *sechdrs);
-- 
2.40.1

[PATCH 00/17] powerpc/ftrace: refactor and add support for -fpatchable-function-entry

2023-06-19 Thread Naveen N Rao

Since RFC (*):
- Patches 1 and 17 have been included in this series due to 
  dependencies. Both had been posted out separately.
- Patch 10 has a small change to not throw errors when checking 
  instruction sequence generated by older toolchains.

This has had more testing since and this looks good to me. Christophe 
mentioned that this results in a slowdown with ftrace [de-]activation on 
ppc32, but that isn't performance critical and we can address that 
separately.


(*) http://lore.kernel.org/cover.1686151854.git.nav...@kernel.org


- Naveen


Naveen N Rao (17):
  powerpc/ftrace: Fix dropping weak symbols with older toolchains
  powerpc/module: Remove unused .ftrace.tramp section
  powerpc64/ftrace: Move ELFv1 and -pg support code into a separate file
  powerpc/ftrace: Simplify function_graph support in ftrace.c
  powerpc/ftrace: Use FTRACE_REGS_ADDR to identify the correct ftrace
trampoline
  powerpc/ftrace: Extend ftrace support for large kernels to ppc32
  powerpc/ftrace: Consolidate ftrace support into fewer files
  powerpc/ftrace: Refactor ftrace_modify_code()
  powerpc/ftrace: Stop re-purposing linker generated long branches for
ftrace
  powerpc/ftrace: Add separate ftrace_init_nop() with additional
validation
  powerpc/ftrace: Simplify ftrace_make_nop()
  powerpc/ftrace: Simplify ftrace_make_call()
  powerpc/ftrace: Simplify ftrace_modify_call()
  powerpc/ftrace: Replace use of ftrace_call_replace() with
ftrace_create_branch_inst()
  powerpc/ftrace: Implement ftrace_replace_code()
  powerpc/ftrace: Add support for -fpatchable-function-entry
  powerpc/ftrace: Create a dummy stackframe to fix stack unwind

 arch/powerpc/Kconfig  |  14 +-
 arch/powerpc/Makefile |   5 +
 arch/powerpc/include/asm/ftrace.h |  24 +-
 arch/powerpc/include/asm/module.h |   4 -
 arch/powerpc/include/asm/sections.h   |   2 +
 arch/powerpc/include/asm/vermagic.h   |   4 +-
 arch/powerpc/kernel/module_64.c   |   2 +-
 arch/powerpc/kernel/trace/Makefile|  12 +-
 arch/powerpc/kernel/trace/ftrace.c| 910 +-
 arch/powerpc/kernel/trace/ftrace_64_pg.S  |  67 --
 arch/powerpc/kernel/trace/ftrace_64_pg.c  | 846 
 .../{ftrace_low.S => ftrace_64_pg_entry.S}|  64 +-
 .../{ftrace_mprofile.S => ftrace_entry.S} |  78 +-
 arch/powerpc/kernel/vmlinux.lds.S |   4 -
 .../gcc-check-fpatchable-function-entry.sh|  26 +
 15 files changed, 1288 insertions(+), 774 deletions(-)
 delete mode 100644 arch/powerpc/kernel/trace/ftrace_64_pg.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64_pg.c
 rename arch/powerpc/kernel/trace/{ftrace_low.S => ftrace_64_pg_entry.S} (54%)
 rename arch/powerpc/kernel/trace/{ftrace_mprofile.S => ftrace_entry.S} (79%)
 create mode 100755 arch/powerpc/tools/gcc-check-fpatchable-function-entry.sh


base-commit: 12ffddc6444780aec83fa5086673ec005c0bace4
-- 
2.40.1

next: arch/powerpc/kernel/stacktrace.c:171:9: error: implicit declaration of function 'nmi_cpu_backtrace'

2023-06-19 Thread Naresh Kamboju

Following build regressions noticed on Linux next-20230619.

Reported-by: Linux Kernel Functional Testing 


Regressions found on powerpc:

 - build/clang-nightly-maple_defconfig
 - build/gcc-8-maple_defconfig
 - build/gcc-12-maple_defconfig
 - build/clang-nightly-cell_defconfig
 - build/gcc-12-cell_defconfig
 - build/gcc-8-cell_defconfig
 - build/clang-16-cell_defconfig
 - build/clang-16-maple_defconfig

Build log:
arch/powerpc/kernel/stacktrace.c: In function 'handle_backtrace_ipi':
arch/powerpc/kernel/stacktrace.c:171:9: error: implicit declaration of
function 'nmi_cpu_backtrace' [-Werror=implicit-function-declaration]
  171 | nmi_cpu_backtrace(regs);
  | ^
arch/powerpc/kernel/stacktrace.c: In function 'arch_trigger_cpumask_backtrace':
arch/powerpc/kernel/stacktrace.c:226:9: error: implicit declaration of
function 'nmi_trigger_cpumask_backtrace'; did you mean
'arch_trigger_cpumask_backtrace'?
[-Werror=implicit-function-declaration]
  226 | nmi_trigger_cpumask_backtrace(mask, exclude_self,
raise_backtrace_ipi);
  | ^
  | arch_trigger_cpumask_backtrace
cc1: all warnings being treated as errors


Links:
 - 
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230619/testrun/17629288/suite/build/test/gcc-12-cell_defconfig/log
 - 
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230619/testrun/17629288/suite/build/test/gcc-12-cell_defconfig/history/
 - 
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230619/testrun/17629288/suite/build/test/gcc-12-cell_defconfig/details/

Steps to reproduce:
# To install tuxmake to your home directory at ~/.local/bin:
# pip3 install -U --user tuxmake
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://docs.tuxmake.org/ for complete documentation.


tuxmake --runtime podman --target-arch powerpc --toolchain gcc-12
--kconfig cell_defconfig

--
Linaro LKFT
https://lkft.linaro.org

[PATCH v3 1/1] powerpc: update ppc_save_regs to save current r1 in pt_regs

2023-06-19 Thread Aditya Gupta

ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.

When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.

GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.

GDB backtrace with vmcore of kernel without this patch:

-
(gdb) bt
 #0  0xc02a53e8 in crash_setup_regs (oldregs=,
newregs=0xc4f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=) at kernel/kexec_core.c:974
 #2  0x0063 in ?? ()
 #3  0xc3579320 in ?? ()
-

Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.

GDB backtrace with vmcore of patched kernel:


(gdb) bt
 #0  0xc02a53e8 in crash_setup_regs (oldregs=0x0, 
newregs=0xc670b8d8)
at ./arch/powerpc/include/asm/kexec.h:69
 #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
 #2  0xc0168918 in panic (fmt=fmt@entry=0xc1654a60 "sysrq 
triggered crash\n")
at kernel/panic.c:358
 #3  0xc0b735f8 in sysrq_handle_crash (key=) at 
drivers/tty/sysrq.c:155
 #4  0xc0b742cc in __handle_sysrq (key=key@entry=99, 
check_mask=check_mask@entry=false)
at drivers/tty/sysrq.c:602
 #5  0xc0b7506c in write_sysrq_trigger (file=, 
buf=,
count=2, ppos=) at drivers/tty/sysrq.c:1163
 #6  0xc069a7bc in pde_write (ppos=, count=,
buf=, file=, pde=0xc362cb40) at 
fs/proc/inode.c:340
 #7  proc_reg_write (file=, buf=, 
count=,
ppos=) at fs/proc/inode.c:352
 #8  0xc05b3bbc in vfs_write (file=file@entry=0xc6aa6b00,
buf=buf@entry=0x61f498b4f60 ,
count=count@entry=2, pos=pos@entry=0xc670bda0) at 
fs/read_write.c:582
 #9  0xc05b4264 in ksys_write (fd=,
buf=0x61f498b4f60 , 
count=2)
at fs/read_write.c:637
 #10 0xc002ea2c in system_call_exception (regs=0xc670be80, 
r0=)
at arch/powerpc/kernel/syscall.c:171
 #11 0xc000c270 in system_call_vectored_common ()
at arch/powerpc/kernel/interrupt_64.S:192


Fixes: d16a58f8854b1 ("powerpc: Improve ppc_save_regs()")
Reviewed-by: Nicholas Piggin 
Signed-off-by: Aditya Gupta 
---

More information:

This problem with gdb backtrace was discovered while working on a crash
tool enhancement to improve crash analysis using gdb passthrough to be
able print function arguments and local variables inside crash tool. gdb
passthrough simply asks gdb to handle the backtrace printing, where it
was noticed that it could not print correct backtrace in some vmcores.

The changes introduced here has an implication on xmon, that it might show
one extra `xmon` frame in backtrace. By looking at older commits it seems
that originally the ppc_save_regs function was introduced as
xmon_save_regs(). But now the same function has been renamed to
ppc_save_regs() and been used in few other places as well.

Tested this patch with multiple ways of crashing:
1. direct panic call (`echo c > /proc/sysrq-trigger`)
2. null dereference/oops path (the earlier implementation of 
`sysrq_handle_crash`)
3. sys reset
4. sys reset inside qemu

Changelog
V3:
  - resend as normal patch with Reviewed-by and Fixes tag
RFC V2:
  
https://lore.kernel.org/linuxppc-dev/334e6694-e4e4-dce4-9443-2aaccdb86...@linux.ibm.com/T/#mb7c9e73b4ba771f6eba74c7d8e13dbb118619ad2
  - fixed bogus LR by storing caller's LR area as pointed out by Naveen and
  Nick

---
 arch/powerpc/kernel/ppc_save_regs.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_save_regs.S 
b/arch/powerpc/kernel/ppc_save_regs.S
index 49813f982468..a9b9c32d0c1f 100644
--- a/arch/powerpc/kernel/ppc_save_regs.S
+++ b/arch/powerpc/kernel/ppc_save_regs.S
@@ -31,10 +31,10 @@ _GLOBAL(ppc_save_regs)
lbz r0,PACAIRQSOFTMASK(r13)
PPC_STL r0,SOFTE(r3)
 #endif
-   /* go up one stack frame for SP */
-   PPC_LL  r4,0(r1)
-   PPC_STL r4,GPR1(r3)
+   /* store current SP */
+   PPC_STL r1,GPR1(r3)
/* get caller's LR */
+   PPC_LL  r4,0(r1)
PPC_LL  r0,LRSAVE(r4)
PPC_STL r0,_LINK(r3)
mflrr0
-- 
2.40.1

50 matches

Mail list logo