Re: [PATCH 00/23] powerpc: Syscall wrapper and register clearing

2022-09-15 Thread Rohan McLure


> On 16 Sep 2022, at 3:32 pm, Rohan McLure  wrote:
> 
> V4 available here:
> 
> Link: 
> https://lore.kernel.org/all/20220824020548.62625-1-rmcl...@linux.ibm.com/
> 
> Implement a syscall wrapper, causing arguments to handlers to be passed
> via a struct pt_regs on the stack. The syscall wrapper is implemented
> for all platforms other than the Cell processor, from which SPUs expect
> the ability to directly call syscall handler symbols with the regular
> in-register calling convention.
> 
> Adopting syscall wrappers requires redefinition of architecture-specific
> syscalls and compatibility syscalls to use the SYSCALL_DEFINE and
> COMPAT_SYSCALL_DEFINE macros, as well as removal of direct-references to
> the emitted syscall-handler symbols from within the kernel. This work
> lead to the following modernisations of powerpc's syscall handlers:
> 
> - Replace syscall 82 semantics with sys_old_select and remove
>   ppc_select handler, which features direct call to both sys_old_select
>   and sys_select.
> - Use a generic fallocate compatibility syscall
> 
> Replace asm implementation of syscall table with C implementation for
> more compile-time checks.
> 
> Many compatibility syscalls are candidates to be removed in favour of
> generically defined handlers, but exhibit different parameter orderings
> and numberings due to 32-bit ABI support for 64-bit parameters. The
> parameter reorderings are however consistent with arm. A future patch
> series will serve to modernise syscalls by providing generic
> implementations featuring these reorderings.
> 
> The design of this syscall is very similar to the s390, x86 and arm64
> implementations. See also Commit 4378a7d4be30 (arm64: implement syscall 
> wrappers).
> The motivation for this change is that it allows for the clearing of
> register state when entering the kernel via through interrupt handlers
> on 64-bit servers. This serves to reduce the influence of values in
> registers carried over from the interrupted process, e.g. syscall
> parameters from user space, or user state at the site of a pagefault.
> All values in registers are saved and zeroized at the entry to an
> interrupt handler and restored afterward. While this may sound like a
> heavy-weight mitigation, many gprs are already saved and restored on
> handling of an interrupt, and the mmap_bench benchmark on Power 9 guest,
> repeatedly invoking the pagefault handler suggests at most ~0.8%
> regression in performance. Realistic workloads are not constantly
> producing interrupts, and so this does not indicate realistic slowdown.
> 
> Using wrapped syscalls yields to a performance improvement of ~5.6% on
> the null_syscall benchmark on pseries guests, by removing the need for
> system_call_exception to allocate its own stack frame. This amortises
> the additional costs of saving and restoring non-volatile registers
> (register clearing is cheap on super scalar platforms), and so the
> final mitigation actually yields a net performance improvement of ~0.6%
> on the null_syscall benchmark.
> 
> The clearing of general purpose registers on interrupts other than
> syscalls is enabled by default only on Book3E 64-bit systems (where the
> mitigation is inexpensive), but available to other 64-bit systems via
> the INTERRUPT_SANITIZE_REGISTERS Kconfig option. This mitigation is
> optional, as the speculation influence of interrupts is likely less than
> that of syscalls.
> 
> Patch Changelog:
> 
> - Format orig_r3 handling as its own patch rather than just a revert.
> - Provide asm-generic BE implementation of long-long munging syscall
>   compatiblity arguments.
> - Syscall #82 now refers to generic sys_old_select or
>   comptat_sys_old_select.
> - Drop 'inline' on static helper functions for mmap, personality.
> - Remove arch-specific sys fallocate implementation that was meant to
>   have been removed in V2.
> - Remove references to syscall wrapper until it is introduced.
> - Rearrange patch series so the last five patches are syscall wrapper >
>   syscall register clears > interrupt register clears.
> - Whether non-syscall interrupts should clear registers is now
>   configurable by INTERRUPT_SANITIZE_REGISTERS.
> 
> Rohan McLure (23):
>  powerpc: Remove asmlinkage from syscall handler definitions
>  powerpc: Save caller r3 prior to system_call_exception
>  powerpc: Add ZEROIZE_GPRS macros for register clears
>  powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers
>  powerpc/32: Clarify interrupt restores with REST_GPR macro in
>entry_32.S
>  powerpc/64e: Clarify register saves and clears with
>{SAVE,ZEROIZE}_GPRS
>  powerpc/64s: Fix comment on interrupt handler prologue
>  powerpc: Fix fallocate and fadvise64_64 compat parameter combination
>  asm-generic: compat: Support BE for long long args in 32-bit ABIs
>  powerpc: Use generic fallocate compatibility syscall
>  powerpc/32: Remove powerpc select specialisation
>  powerpc: Remove direct call to personality syscall handler

Re: [PATCH 15/23] powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers

2022-09-15 Thread Rohan McLure



> On 16 Sep 2022, at 3:32 pm, Rohan McLure  wrote:
> 
> Arch-specific implementations of syscall handlers are currently used
> over generic implementations for the following reasons:
> 
> 1. Semantics unique to powerpc
> 2. Compatibility syscalls require 'argument padding' to comply with
>   64-bit argument convention in ELF32 abi.
> 3. Parameter types or order is different in other architectures.
> 
> These syscall handlers have been defined prior to this patch series
> without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
> custom input and output types. We remove every such direct definition in
> favour of the aforementioned macros.
> 
> Also update syscalls.tbl in order to refer to the symbol names generated
> by each of these macros. Since ppc64_personality can be called by both
> 64 bit and 32 bit binaries through compatibility, we must generate both
> both compat_sys_ and sys_ symbols for this handler.
> 
> As an aside:
> A number of architectures including arm and powerpc agree on an
> alternative argument order and numbering for most of these arch-specific
> handlers. A future patch series may allow for asm/unistd.h to signal
> through its defines that a generic implementation of these syscall
> handlers with the correct calling convention be emitted, through the
> __ARCH_WANT_COMPAT_SYS_... convention.
> 
> Signed-off-by: Rohan McLure 
> ---
> V1 -> V2: All syscall handlers wrapped by this macro.
> V2 -> V3: Move creation of do_ppc64_personality helper to prior patch.
> V3 -> V4: Fix parenthesis alignment. Don't emit sys_*** symbols.
> V4 -> V5: Use 'aside' in the asm-generic rant in commit message.
> ---
> arch/powerpc/include/asm/syscalls.h  | 10 ++---
> arch/powerpc/kernel/sys_ppc32.c  | 38 +++---
> arch/powerpc/kernel/syscalls.c   | 17 ++--
> arch/powerpc/kernel/syscalls/syscall.tbl | 22 +-
> .../arch/powerpc/entry/syscalls/syscall.tbl  | 22 +-
> 5 files changed, 64 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/syscalls.h 
> b/arch/powerpc/include/asm/syscalls.h
> index 20cbd29b1228..525d2aa0c8ca 100644
> --- a/arch/powerpc/include/asm/syscalls.h
> +++ b/arch/powerpc/include/asm/syscalls.h
> @@ -28,10 +28,10 @@ long sys_mmap(unsigned long addr, size_t len,
> long sys_mmap2(unsigned long addr, size_t len,
>  unsigned long prot, unsigned long flags,
>  unsigned long fd, unsigned long pgoff);
> -long ppc64_personality(unsigned long personality);
> +long sys_ppc64_personality(unsigned long personality);
> long sys_rtas(struct rtas_args __user *uargs);
> -long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
> -   u32 len_high, u32 len_low);
> +long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 
> offset_low,
> +   u32 len_high, u32 len_low);
> 
> #ifdef CONFIG_COMPAT
> unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
> @@ -52,8 +52,8 @@ int compat_sys_truncate64(const char __user *path, u32 reg4,
> int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
>  unsigned long len2);
> 
> -long ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
> -  size_t len, int advice);
> +long compat_sys_ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
> + size_t len, int advice);
> 
> long compat_sys_sync_file_range2(int fd, unsigned int flags,
>unsigned int offset1, unsigned int offset2,
> diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
> index 776ae7565fc5..dcc3c9fd4cfd 100644
> --- a/arch/powerpc/kernel/sys_ppc32.c
> +++ b/arch/powerpc/kernel/sys_ppc32.c
> @@ -47,45 +47,55 @@
> #include 
> #include 
> 
> -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
> compat_size_t count,
> -  u32 reg6, u32 pos1, u32 pos2)
> +COMPAT_SYSCALL_DEFINE6(ppc_pread64,
> +unsigned int, fd,
> +char __user *, ubuf, compat_size_t, count,
> +u32, reg6, u32, pos1, u32, pos2)
> {
>   return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2));
> }
> 
> -compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, 
> compat_size_t count,
> -   u32 reg6, u32 pos1, u32 pos2)
> +COMPAT_SYSCALL_DEFINE6(ppc_pwrite64,
> +unsigned int, fd,
> +const char __user *, ubuf, compat_size_t, count,
> +u32, reg6, u32, pos1, u32, pos2)
> {
>   return ksys_pwrite64(fd, ubuf, count, merge_64(pos1, pos2));
> }
> 
> -compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 
> offset2, u32 count)
> +COMPAT_SYSCALL_DEFINE5(ppc_readahead,
> +int, fd, u32, r4,
> +u32, offset1, u32, offset2, u32, count)
> {
> 

[PATCH 00/23] powerpc: Syscall wrapper and register clearing

2022-09-15 Thread Rohan McLure
V4 available here:

Link: https://lore.kernel.org/all/20220824020548.62625-1-rmcl...@linux.ibm.com/

Implement a syscall wrapper, causing arguments to handlers to be passed
via a struct pt_regs on the stack. The syscall wrapper is implemented
for all platforms other than the Cell processor, from which SPUs expect
the ability to directly call syscall handler symbols with the regular
in-register calling convention.

Adopting syscall wrappers requires redefinition of architecture-specific
syscalls and compatibility syscalls to use the SYSCALL_DEFINE and
COMPAT_SYSCALL_DEFINE macros, as well as removal of direct-references to
the emitted syscall-handler symbols from within the kernel. This work
lead to the following modernisations of powerpc's syscall handlers:

 - Replace syscall 82 semantics with sys_old_select and remove
   ppc_select handler, which features direct call to both sys_old_select
   and sys_select.
 - Use a generic fallocate compatibility syscall

Replace asm implementation of syscall table with C implementation for
more compile-time checks.

Many compatibility syscalls are candidates to be removed in favour of
generically defined handlers, but exhibit different parameter orderings
and numberings due to 32-bit ABI support for 64-bit parameters. The
parameter reorderings are however consistent with arm. A future patch
series will serve to modernise syscalls by providing generic
implementations featuring these reorderings.

The design of this syscall is very similar to the s390, x86 and arm64
implementations. See also Commit 4378a7d4be30 (arm64: implement syscall 
wrappers).
The motivation for this change is that it allows for the clearing of
register state when entering the kernel via through interrupt handlers
on 64-bit servers. This serves to reduce the influence of values in
registers carried over from the interrupted process, e.g. syscall
parameters from user space, or user state at the site of a pagefault.
All values in registers are saved and zeroized at the entry to an
interrupt handler and restored afterward. While this may sound like a
heavy-weight mitigation, many gprs are already saved and restored on
handling of an interrupt, and the mmap_bench benchmark on Power 9 guest,
repeatedly invoking the pagefault handler suggests at most ~0.8%
regression in performance. Realistic workloads are not constantly
producing interrupts, and so this does not indicate realistic slowdown.

Using wrapped syscalls yields to a performance improvement of ~5.6% on
the null_syscall benchmark on pseries guests, by removing the need for
system_call_exception to allocate its own stack frame. This amortises
the additional costs of saving and restoring non-volatile registers
(register clearing is cheap on super scalar platforms), and so the
final mitigation actually yields a net performance improvement of ~0.6%
on the null_syscall benchmark.

The clearing of general purpose registers on interrupts other than
syscalls is enabled by default only on Book3E 64-bit systems (where the
mitigation is inexpensive), but available to other 64-bit systems via
the INTERRUPT_SANITIZE_REGISTERS Kconfig option. This mitigation is
optional, as the speculation influence of interrupts is likely less than
that of syscalls.

Patch Changelog:

 - Format orig_r3 handling as its own patch rather than just a revert.
 - Provide asm-generic BE implementation of long-long munging syscall
   compatiblity arguments.
 - Syscall #82 now refers to generic sys_old_select or
   comptat_sys_old_select.
 - Drop 'inline' on static helper functions for mmap, personality.
 - Remove arch-specific sys fallocate implementation that was meant to
   have been removed in V2.
 - Remove references to syscall wrapper until it is introduced.
 - Rearrange patch series so the last five patches are syscall wrapper >
   syscall register clears > interrupt register clears.
 - Whether non-syscall interrupts should clear registers is now
   configurable by INTERRUPT_SANITIZE_REGISTERS.

Rohan McLure (23):
  powerpc: Remove asmlinkage from syscall handler definitions
  powerpc: Save caller r3 prior to system_call_exception
  powerpc: Add ZEROIZE_GPRS macros for register clears
  powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers
  powerpc/32: Clarify interrupt restores with REST_GPR macro in
entry_32.S
  powerpc/64e: Clarify register saves and clears with
{SAVE,ZEROIZE}_GPRS
  powerpc/64s: Fix comment on interrupt handler prologue
  powerpc: Fix fallocate and fadvise64_64 compat parameter combination
  asm-generic: compat: Support BE for long long args in 32-bit ABIs
  powerpc: Use generic fallocate compatibility syscall
  powerpc/32: Remove powerpc select specialisation
  powerpc: Remove direct call to personality syscall handler
  powerpc: Remove direct call to mmap2 syscall handlers
  powerpc: Provide do_ppc64_personality helper
  powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
  powerpc: Include all arch-specific 

[PATCH 12/23] powerpc: Remove direct call to personality syscall handler

2022-09-15 Thread Rohan McLure
Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Fortunately, in the case of ppc64_personality, its call to
sys_personality can be replaced with an invocation to the
equivalent ksys_personality inline helper in .

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V1 -> V2: Use inline helper to deduplicate bodies in compat/regular
implementations.
V3 -> V4: Move to be applied before syscall wrapper.
---
 arch/powerpc/kernel/syscalls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 34e1ae88e15b..a04c97faa21a 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -71,7 +71,7 @@ long ppc64_personality(unsigned long personality)
if (personality(current->personality) == PER_LINUX32
&& personality(personality) == PER_LINUX)
personality = (personality & ~PER_MASK) | PER_LINUX32;
-   ret = sys_personality(personality);
+   ret = ksys_personality(personality);
if (personality(ret) == PER_LINUX32)
ret = (ret & ~PER_MASK) | PER_LINUX;
return ret;
-- 
2.34.1



[PATCH 20/23] powerpc/64s: Clear/restore caller gprs in syscall interrupt/return

2022-09-15 Thread Rohan McLure
Clear user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Signed-off-by: Rohan McLure 
---
V1 -> V2: Update summary
V2 -> V3: Remove erroneous summary paragraph on syscall_exit_prepare
V3 -> V4: Use ZEROIZE instead of NULLIFY. Clear r0 also.
V4 -> V5: Move to end of patch series.
---
 arch/powerpc/kernel/interrupt_64.S | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 16a1b44088e7..40147558e1a6 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -70,7 +70,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
ld  r2,PACATOC(r13)
mfcrr12
li  r11,0
-   /* Can we avoid saving r3-r8 in common case? */
+   /* Save syscall parameters in r3-r8 */
SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
@@ -110,6 +110,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 * Zero user registers to prevent influencing speculative execution
 * state of kernel code.
 */
+   ZEROIZE_GPR(0)
ZEROIZE_GPRS(5, 12)
ZEROIZE_NVGPRS()
bl  system_call_exception
@@ -140,6 +141,7 @@ BEGIN_FTR_SECTION
HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
+   REST_NVGPRS(r1)
cmpdi   r3,0
bne .Lsyscall_vectored_\name\()_restore_regs
 
@@ -243,7 +245,7 @@ END_BTB_FLUSH_SECTION
ld  r2,PACATOC(r13)
mfcrr12
li  r11,0
-   /* Can we avoid saving r3-r8 in common case? */
+   /* Save syscall parameters in r3-r8 */
SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
@@ -295,6 +297,7 @@ END_BTB_FLUSH_SECTION
 * Zero user registers to prevent influencing speculative execution
 * state of kernel code.
 */
+   ZEROIZE_GPR(0)
ZEROIZE_GPRS(5, 12)
ZEROIZE_NVGPRS()
bl  system_call_exception
@@ -337,6 +340,7 @@ BEGIN_FTR_SECTION
stdcx.  r0,0,r1 /* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
+   REST_NVGPRS(r1)
cmpdi   r3,0
bne .Lsyscall_restore_regs
/* Zero volatile regs that may contain sensitive kernel data */
@@ -364,7 +368,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .Lsyscall_restore_regs:
ld  r3,_CTR(r1)
ld  r4,_XER(r1)
-   REST_NVGPRS(r1)
mtctr   r3
mtspr   SPRN_XER,r4
REST_GPR(0, r1)
-- 
2.34.1



[PATCH 11/23] powerpc/32: Remove powerpc select specialisation

2022-09-15 Thread Rohan McLure
Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.

The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.

Link: 
https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a...@csgroup.eu/

As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.

Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V1 -> V2: Remove arch-specific select handler
V2 -> V3: Remove ppc_old_select prototype in . Move to
earlier in patch series
V4 -> V5: Use compat_sys_old_select on 64-bit systems.
---
 arch/powerpc/include/asm/syscalls.h   |  2 --
 arch/powerpc/kernel/syscalls.c| 17 -
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 960b3871db72..20cbd29b1228 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -30,8 +30,6 @@ long sys_mmap2(unsigned long addr, size_t len,
   unsigned long fd, unsigned long pgoff);
 long ppc64_personality(unsigned long personality);
 long sys_rtas(struct rtas_args __user *uargs);
-int ppc_select(int n, fd_set __user *inp, fd_set __user *outp,
-  fd_set __user *exp, struct __kernel_old_timeval __user *tvp);
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low);
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index abc3fbb3c490..34e1ae88e15b 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -63,23 +63,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT);
 }
 
-#ifdef CONFIG_PPC32
-/*
- * Due to some executables calling the wrong select we sometimes
- * get wrong args.  This determines how the args are being passed
- * (a single ptr to them all args passed) then calls
- * sys_select() with the appropriate args. -- Cort
- */
-int
-ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, 
struct __kernel_old_timeval __user *tvp)
-{
-   if ((unsigned long)n >= 4096)
-   return sys_old_select((void __user *)n);
-
-   return sys_select(n, inp, outp, exp, tvp);
-}
-#endif
-
 #ifdef CONFIG_PPC64
 long ppc64_personality(unsigned long personality)
 {
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 2600b4237292..64f27cbbdd2c 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -110,7 +110,7 @@
 79 common  settimeofdaysys_settimeofday
compat_sys_settimeofday
 80 common  getgroups   sys_getgroups
 81 common  setgroups   sys_setgroups
-82 32  select  ppc_select  
sys_ni_syscall
+82 32  select  sys_old_select  
compat_sys_old_select
 82 64  select  sys_ni_syscall
 82 spu select  sys_ni_syscall
 83 common  symlink sys_symlink
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl 
b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 2600b4237292..64f27cbbdd2c 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -110,7 +110,7 @@
 79 common  settimeofdaysys_settimeofday
compat_sys_settimeofday
 80 common  getgroups   sys_getgroups
 81 common  setgroups   sys_setgroups
-82 32  select  ppc_select  
sys_ni_syscall
+82 32  select  sys_old_select  
compat_sys_old_select
 82 64  select 

[PATCH 07/23] powerpc/64s: Fix comment on interrupt handler prologue

2022-09-15 Thread Rohan McLure
Interrupt handlers on 64s systems will often need to save register state
from the interrupted process to make space for loading special purpose
registers or for internal state.

Fix a comment documenting a common code path macro in the beginning of
interrupt handlers where r10 is saved to the PACA to afford space for
the value of the CFAR. Comment is currently written as if r10-r12 are
saved to PACA, but in fact only r10 is saved, with r11-r12 saved much
later. The distance in code between these saves has grown over the many
revisions of this macro. Fix this by signalling with a comment where
r11-r12 are saved to the PACA.

Signed-off-by: Rohan McLure 
Reported-by: Nicholas Piggin 
---
V1 -> V2: Given its own commit
V2 -> V3: Annotate r11-r12 save locations with comment.
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3d0dc133a9ae..a3b51441b039 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -281,7 +281,7 @@ BEGIN_FTR_SECTION
mfspr   r9,SPRN_PPR
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
HMT_MEDIUM
-   std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
+   std r10,IAREA+EX_R10(r13)   /* save r10 */
.if ICFAR
 BEGIN_FTR_SECTION
mfspr   r10,SPRN_CFAR
@@ -321,7 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
mfctr   r10
std r10,IAREA+EX_CTR(r13)
mfcrr9
-   std r11,IAREA+EX_R11(r13)
+   std r11,IAREA+EX_R11(r13)   /* save r11 - r12 */
std r12,IAREA+EX_R12(r13)
 
/*
-- 
2.34.1



[PATCH 15/23] powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers

2022-09-15 Thread Rohan McLure
Arch-specific implementations of syscall handlers are currently used
over generic implementations for the following reasons:

1. Semantics unique to powerpc
2. Compatibility syscalls require 'argument padding' to comply with
   64-bit argument convention in ELF32 abi.
3. Parameter types or order is different in other architectures.

These syscall handlers have been defined prior to this patch series
without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
custom input and output types. We remove every such direct definition in
favour of the aforementioned macros.

Also update syscalls.tbl in order to refer to the symbol names generated
by each of these macros. Since ppc64_personality can be called by both
64 bit and 32 bit binaries through compatibility, we must generate both
both compat_sys_ and sys_ symbols for this handler.

As an aside:
A number of architectures including arm and powerpc agree on an
alternative argument order and numbering for most of these arch-specific
handlers. A future patch series may allow for asm/unistd.h to signal
through its defines that a generic implementation of these syscall
handlers with the correct calling convention be emitted, through the
__ARCH_WANT_COMPAT_SYS_... convention.

Signed-off-by: Rohan McLure 
---
V1 -> V2: All syscall handlers wrapped by this macro.
V2 -> V3: Move creation of do_ppc64_personality helper to prior patch.
V3 -> V4: Fix parenthesis alignment. Don't emit sys_*** symbols.
V4 -> V5: Use 'aside' in the asm-generic rant in commit message.
---
 arch/powerpc/include/asm/syscalls.h  | 10 ++---
 arch/powerpc/kernel/sys_ppc32.c  | 38 +++---
 arch/powerpc/kernel/syscalls.c   | 17 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl  | 22 +-
 5 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 20cbd29b1228..525d2aa0c8ca 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -28,10 +28,10 @@ long sys_mmap(unsigned long addr, size_t len,
 long sys_mmap2(unsigned long addr, size_t len,
   unsigned long prot, unsigned long flags,
   unsigned long fd, unsigned long pgoff);
-long ppc64_personality(unsigned long personality);
+long sys_ppc64_personality(unsigned long personality);
 long sys_rtas(struct rtas_args __user *uargs);
-long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
- u32 len_high, u32 len_low);
+long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
+ u32 len_high, u32 len_low);
 
 #ifdef CONFIG_COMPAT
 unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
@@ -52,8 +52,8 @@ int compat_sys_truncate64(const char __user *path, u32 reg4,
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
   unsigned long len2);
 
-long ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
-size_t len, int advice);
+long compat_sys_ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2,
+   size_t len, int advice);
 
 long compat_sys_sync_file_range2(int fd, unsigned int flags,
 unsigned int offset1, unsigned int offset2,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 776ae7565fc5..dcc3c9fd4cfd 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -47,45 +47,55 @@
 #include 
 #include 
 
-compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
-u32 reg6, u32 pos1, u32 pos2)
+COMPAT_SYSCALL_DEFINE6(ppc_pread64,
+  unsigned int, fd,
+  char __user *, ubuf, compat_size_t, count,
+  u32, reg6, u32, pos1, u32, pos2)
 {
return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2));
 }
 
-compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, 
compat_size_t count,
- u32 reg6, u32 pos1, u32 pos2)
+COMPAT_SYSCALL_DEFINE6(ppc_pwrite64,
+  unsigned int, fd,
+  const char __user *, ubuf, compat_size_t, count,
+  u32, reg6, u32, pos1, u32, pos2)
 {
return ksys_pwrite64(fd, ubuf, count, merge_64(pos1, pos2));
 }
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 offset2, 
u32 count)
+COMPAT_SYSCALL_DEFINE5(ppc_readahead,
+  int, fd, u32, r4,
+  u32, offset1, u32, offset2, u32, count)
 {
return ksys_readahead(fd, merge_64(offset1, offset2), count);
 }
 
-int compat_sys_truncate64(const char __user * path, u32 reg4,
-   unsigned long len1, unsigned long 

[PATCH 08/23] powerpc: Fix fallocate and fadvise64_64 compat parameter combination

2022-09-15 Thread Rohan McLure
As reported[1] by Arnd, the arch-specific fadvise64_64 and fallocate
compatibility handlers assume parameters are passed with 32-bit
big-endian ABI. This affects the assignment of odd-even parameter pairs
to the high or low words of a 64-bit syscall parameter.

Fix fadvise64_64 fallocate compat handlers to correctly swap upper/lower
32 bits conditioned on endianness.

A future patch will replace the arch-specific compat fallocate with an
asm-generic implementation. This patch is intended for ease of
back-port.

[1]: 
https://lore.kernel.org/all/be29926f-226e-48dc-871a-e29a54e80...@www.fastmail.com/

Fixes: 57f48b4b74e7 ("powerpc/compat_sys: swap hi/lo parts of 64-bit syscall 
args in LE mode")
Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
---
V4 -> V5: New patch.
---
 arch/powerpc/include/asm/syscalls.h | 12 
 arch/powerpc/kernel/sys_ppc32.c | 14 +-
 arch/powerpc/kernel/syscalls.c  |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 21c2faaa2957..16b668515d15 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,18 @@
 #include 
 #include 
 
+/*
+ * long long munging:
+ * The 32 bit ABI passes long longs in an odd even register pair.
+ * High and low parts are swapped depending on endian mode,
+ * so define a macro (similar to mips linux32) to handle that.
+ */
+#ifdef __LITTLE_ENDIAN__
+#define merge_64(low, high) ((u64)high << 32) | low
+#else
+#define merge_64(high, low) ((u64)high << 32) | low
+#endif
+
 struct rtas_args;
 
 long sys_mmap(unsigned long addr, size_t len,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index f4edcc9489fb..ba363328da2b 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -56,18 +56,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
len,
return sys_mmap(addr, len, prot, flags, fd, pgoff << 12);
 }
 
-/* 
- * long long munging:
- * The 32 bit ABI passes long longs in an odd even register pair.
- * High and low parts are swapped depending on endian mode,
- * so define a macro (similar to mips linux32) to handle that.
- */
-#ifdef __LITTLE_ENDIAN__
-#define merge_64(low, high) ((u64)high << 32) | low
-#else
-#define merge_64(high, low) ((u64)high << 32) | low
-#endif
-
 compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
 u32 reg6, u32 pos1, u32 pos2)
 {
@@ -94,7 +82,7 @@ int compat_sys_truncate64(const char __user * path, u32 reg4,
 long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
 u32 len1, u32 len2)
 {
-   return ksys_fallocate(fd, mode, ((loff_t)offset1 << 32) | offset2,
+   return ksys_fallocate(fd, mode, merge_64(offset1, offset2),
 merge_64(len1, len2));
 }
 
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index fc999140bc27..abc3fbb3c490 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -98,8 +98,8 @@ long ppc64_personality(unsigned long personality)
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low)
 {
-   return ksys_fadvise64_64(fd, (u64)offset_high << 32 | offset_low,
-(u64)len_high << 32 | len_low, advice);
+   return ksys_fadvise64_64(fd, merge_64(offset_high, offset_low),
+merge_64(len_high, len_low), advice);
 }
 
 SYSCALL_DEFINE0(switch_endian)
-- 
2.34.1



[PATCH 17/23] powerpc: Enable compile-time check for syscall handlers

2022-09-15 Thread Rohan McLure
The table of syscall handlers and registered compatibility syscall
handlers has in past been produced using assembly, with function
references resolved at link time. This moves link-time errors to
compile-time, by rewriting systbl.S in C, and including the
linux/syscalls.h, linux/compat.h and asm/syscalls.h headers for
prototypes.

Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
Reported-by: Nicholas Piggin 
---
V1 -> V2: New patch.
V4 -> V5: For this patch only, represent handler function pointers as
unsigned long. Remove reference to syscall wrappers. Use asm/syscalls.h
which implies asm/syscall.h
---
 arch/powerpc/kernel/{systbl.S => systbl.c} | 28 
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.c
similarity index 61%
rename from arch/powerpc/kernel/systbl.S
rename to arch/powerpc/kernel/systbl.c
index 6c1db3b6de2d..ce52bd2ec292 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.c
@@ -10,32 +10,26 @@
  * PPC64 updates by Dave Engebretsen (engeb...@us.ibm.com) 
  */
 
-#include 
+#include 
+#include 
+#include 
+#include 
 
-.section .rodata,"a"
+#define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, entry)
+#define __SYSCALL(nr, entry) [nr] = (unsigned long) ,
 
-#ifdef CONFIG_PPC64
-   .p2align3
-#define __SYSCALL(nr, entry)   .8byte entry
-#else
-   .p2align2
-#define __SYSCALL(nr, entry)   .long entry
-#endif
-
-#define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, native)
-.globl sys_call_table
-sys_call_table:
+const unsigned long sys_call_table[] = {
 #ifdef CONFIG_PPC64
 #include 
 #else
 #include 
 #endif
+};
 
 #ifdef CONFIG_COMPAT
 #undef __SYSCALL_WITH_COMPAT
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, compat)
-.globl compat_sys_call_table
-compat_sys_call_table:
-#define compat_sys_sigsuspend  sys_sigsuspend
+const unsigned long compat_sys_call_table[] = {
 #include 
-#endif
+};
+#endif /* CONFIG_COMPAT */
-- 
2.34.1



[PATCH 10/23] powerpc: Use generic fallocate compatibility syscall

2022-09-15 Thread Rohan McLure
The powerpc fallocate compat syscall handler is identical to the
generic implementation provided by commit 59c10c52f573f ("riscv:
compat: syscall: Add compat_sys_call_table implementation"), and as
such can be removed in favour of the generic implementation.

A future patch series will replace more architecture-defined syscall
handlers with generic implementations, dependent on introducing generic
implementations that are compatible with powerpc and arm's parameter
reorderings.

Reported-by: Arnd Bergmann 
Signed-off-by: Rohan McLure 
---
V1 -> V2: Remove arch-specific fallocate handler.
V2 -> V3: Remove generic fallocate prototype. Move to beginning of
series.
V4 -> V5: Remove implementation as well which I somehow failed to do.
Replace local BE compat_arg_u64 with generic.
---
 arch/powerpc/include/asm/syscalls.h | 2 --
 arch/powerpc/include/asm/unistd.h   | 1 +
 arch/powerpc/kernel/sys_ppc32.c | 7 ---
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 16b668515d15..960b3871db72 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -51,8 +51,6 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 
offset1, u32 offset2, u3
 int compat_sys_truncate64(const char __user *path, u32 reg4,
  unsigned long len1, unsigned long len2);
 
-long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2, u32 
len1, u32 len2);
-
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
   unsigned long len2);
 
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b1129b4ef57d..659a996c75aa 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -45,6 +45,7 @@
 #define __ARCH_WANT_SYS_UTIME
 #define __ARCH_WANT_SYS_NEWFSTATAT
 #define __ARCH_WANT_COMPAT_STAT
+#define __ARCH_WANT_COMPAT_FALLOCATE
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 #endif
 #define __ARCH_WANT_SYS_FORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index ba363328da2b..d961634976d8 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -79,13 +79,6 @@ int compat_sys_truncate64(const char __user * path, u32 reg4,
return ksys_truncate(path, merge_64(len1, len2));
 }
 
-long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
-u32 len1, u32 len2)
-{
-   return ksys_fallocate(fd, mode, merge_64(offset1, offset2),
-merge_64(len1, len2));
-}
-
 int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
 unsigned long len2)
 {
-- 
2.34.1



[PATCH 22/23] powerpc/64s: Clear gprs on interrupt routine entry in Book3S

2022-09-15 Thread Rohan McLure
Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources to limit influence of user-space values
in potential speculation gadgets. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored. Remove explicit REST_NVGPRS
invocations as non-volatiles must now always be restored. 32-bit systems
do not clear user registers on interrupt, and continue to depend on the
return value of interrupt_exit_user_prepare to determine whether or not
to restore non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is disabled by default, but enabled with
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Signed-off-by: Rohan McLure 
---
V1 -> V2: Add benchmark data
V2 -> V3: Use ZEROIZE_GPR{,S} macro renames, clarify
interrupt_exit_user_prepare changes in summary.
V4 -> V5: Configurable now with INTERRUPT_SANITIZE_REGISTERS. Zero r12
(containing MSR) from common macro on per-interrupt basis with IOPTION.
---
 arch/powerpc/kernel/exceptions-64s.S | 37 --
 arch/powerpc/kernel/interrupt_64.S   | 10 +++
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a3b51441b039..be5e72caada1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -111,6 +111,7 @@ name:
 #define ISTACK .L_ISTACK_\name\()  /* Set regular kernel stack */
 #define __ISTACK(name) .L_ISTACK_ ## name
 #define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
+#define IMSR_R12   .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -176,6 +177,9 @@ do_define_int n
.ifndef IKUAP
IKUAP=1
.endif
+   .ifndef IMSR_R12
+   IMSR_R12=0
+   .endif
 .endm
 
 /*
@@ -502,6 +506,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
std r10,0(r1)   /* make stack chain pointer */
std r0,GPR0(r1) /* save r0 in stackframe*/
std r10,GPR1(r1)/* save r1 in stackframe*/
+   ZEROIZE_GPR(0)
 
/* Mark our [H]SRRs valid for return */
li  r10,1
@@ -544,8 +549,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
+   .if !IMSR_R12
+   ZEROIZE_GPRS(9, 12)
+   .else
+   ZEROIZE_GPRS(9, 11)
+   .endif
 
SAVE_NVGPRS(r1)
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+   ZEROIZE_NVGPRS()
+#endif
 
.if IDAR
.if IISIDE
@@ -577,8 +590,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld  r10,IAREA+EX_CTR(r13)
std r10,_CTR(r1)
-   std r2,GPR2(r1) /* save r2 in stackframe*/
-   SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
+   SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe   */
+   ZEROIZE_GPRS(2, 8)
mflrr9  /* Get LR, later save to stack  */
ld  r2,PACATOC(r13) /* get kernel TOC into r2   */
std r9,_LINK(r1)
@@ -696,6 +709,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
mtlrr9
ld  r9,_CCR(r1)
mtcrr9
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+   REST_NVGPRS(r1)
+#endif
REST_GPRS(2, 13, r1)
REST_GPR(0, r1)
/* restore original r1. */
@@ -1368,11 +1384,13 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
b   interrupt_return_srr
 
 1: bl  do_break
+#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS
/*
 * do_break() may have changed the NV GPRS while handling a breakpoint.
 * If so, we need to restore them with their updated values.
 */
REST_NVGPRS(r1)
+#endif
b   interrupt_return_srr
 
 
@@ -1598,7 +1616,9 @@ EXC_COMMON_BEGIN(alignment_common)
GEN_COMMON alignment
addir3,r1,STACK_FRAME_OVERHEAD
bl  alignment_exception
+#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS
REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+#endif
b   interrupt_return_srr
 
 
@@ -1708,7 +1728,9 @@ EXC_COMMON_BEGIN(program_check_common)
 .Ldo_program_check:
addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
+#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS
REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+#endif
b   

[PATCH 21/23] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-09-15 Thread Rohan McLure
Add Kconfig option for enabling clearing of registers on arrival in an
interrupt handler. This reduces the speculation influence of registers
on kernel internals. The option will be consumed by 64-bit systems that
feature speculation and wish to implement this mitigation.

This patch only introduces the Kconfig option, no actual mitigations.

The primary overhead of this mitigation lies in an increased number of
registers that must be saved and restored by interrupt handlers on
Book3S systems. Enable by default on Book3E systems, which prior to
this patch eagerly save and restore register state, meaning that the
mitigation when implemented will have minimal overhead.

Signed-off-by: Rohan McLure 
---
V4 -> V5: New patch
---
 arch/powerpc/Kconfig | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ef6c83e79c9b..a643ebd83349 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -528,6 +528,15 @@ config HOTPLUG_CPU
 
  Say N if you are unsure.
 
+config INTERRUPT_SANITIZE_REGISTERS
+   bool "Clear gprs on interrupt arrival"
+   depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER
+   default PPC_BOOK3E_64
+   help
+ Reduce the influence of user register state on interrupt handlers and
+ syscalls through clearing user state from registers before handling
+ the exception.
+
 config PPC_QUEUED_SPINLOCKS
bool "Queued spinlocks" if EXPERT
depends on SMP
-- 
2.34.1



[PATCH 05/23] powerpc/32: Clarify interrupt restores with REST_GPR macro in entry_32.S

2022-09-15 Thread Rohan McLure
Restoring the register state of the interrupted thread involves issuing
a large number of predictable loads to the kernel stack frame. Issue the
REST_GPR{,S} macros to clearly signal when this is happening, and bunch
together restores at the end of the interrupt handler where the saved
value is not consumed earlier in the handler code.

Signed-off-by: Rohan McLure 
Reported-by: Christophe Leroy 
---
V2 -> V3: New patch.
V3 -> V4: Minimise restores in the unrecoverable window between
restoring SRR0/1 and return from interrupt.
---
 arch/powerpc/kernel/entry_32.S | 33 +---
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 44dfce9a60c5..e4b694cebc44 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -68,7 +68,7 @@ prepare_transfer_to_handler:
lwz r9,_MSR(r11)/* if sleeping, clear MSR.EE */
rlwinm  r9,r9,0,~MSR_EE
lwz r12,_LINK(r11)  /* and return to address in LR */
-   lwz r2, GPR2(r11)
+   REST_GPR(2, r11)
b   fast_exception_return
 _ASM_NOKPROBE_SYMBOL(prepare_transfer_to_handler)
 #endif /* CONFIG_PPC_BOOK3S_32 || CONFIG_E500 */
@@ -144,7 +144,7 @@ ret_from_syscall:
lwz r7,_NIP(r1)
lwz r8,_MSR(r1)
cmpwi   r3,0
-   lwz r3,GPR3(r1)
+   REST_GPR(3, r1)
 syscall_exit_finish:
mtspr   SPRN_SRR0,r7
mtspr   SPRN_SRR1,r8
@@ -152,8 +152,8 @@ syscall_exit_finish:
bne 3f
mtcrr5
 
-1: lwz r2,GPR2(r1)
-   lwz r1,GPR1(r1)
+1: REST_GPR(2, r1)
+   REST_GPR(1, r1)
rfi
 #ifdef CONFIG_40x
b . /* Prevent prefetch past rfi */
@@ -165,10 +165,8 @@ syscall_exit_finish:
REST_NVGPRS(r1)
mtctr   r4
mtxer   r5
-   lwz r0,GPR0(r1)
-   lwz r3,GPR3(r1)
-   REST_GPRS(4, 11, r1)
-   lwz r12,GPR12(r1)
+   REST_GPR(0, r1)
+   REST_GPRS(3, 12, r1)
b   1b
 
 #ifdef CONFIG_44x
@@ -260,9 +258,8 @@ fast_exception_return:
beq 3f  /* if not, we've got problems */
 #endif
 
-2: REST_GPRS(3, 6, r11)
-   lwz r10,_CCR(r11)
-   REST_GPRS(1, 2, r11)
+2: lwz r10,_CCR(r11)
+   REST_GPRS(1, 6, r11)
mtcrr10
lwz r10,_LINK(r11)
mtlrr10
@@ -277,7 +274,7 @@ fast_exception_return:
mtspr   SPRN_SRR0,r12
REST_GPR(9, r11)
REST_GPR(12, r11)
-   lwz r11,GPR11(r11)
+   REST_GPR(11, r11)
rfi
 #ifdef CONFIG_40x
b . /* Prevent prefetch past rfi */
@@ -454,9 +451,8 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
lwz r3,_MSR(r1);\
andi.   r3,r3,MSR_PR;   \
bne interrupt_return;   \
-   lwz r0,GPR0(r1);\
-   lwz r2,GPR2(r1);\
-   REST_GPRS(3, 8, r1);\
+   REST_GPR(0, r1);\
+   REST_GPRS(2, 8, r1);\
lwz r10,_XER(r1);   \
lwz r11,_CTR(r1);   \
mtspr   SPRN_XER,r10;   \
@@ -475,11 +471,8 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
lwz r12,_MSR(r1);   \
mtspr   exc_lvl_srr0,r11;   \
mtspr   exc_lvl_srr1,r12;   \
-   lwz r9,GPR9(r1);\
-   lwz r12,GPR12(r1);  \
-   lwz r10,GPR10(r1);  \
-   lwz r11,GPR11(r1);  \
-   lwz r1,GPR1(r1);\
+   REST_GPRS(9, 12, r1);   \
+   REST_GPR(1, r1);\
exc_lvl_rfi;\
b   .;  /* prevent prefetch past exc_lvl_rfi */
 
-- 
2.34.1



[PATCH 02/23] powerpc: Save caller r3 prior to system_call_exception

2022-09-15 Thread Rohan McLure
This reverts commit 8875f47b7681 ("powerpc/syscall: Save r3 in regs->orig_r3
").

Save caller's original r3 state to the kernel stackframe before entering
system_call_exception. This allows for user registers to be cleared by
the time system_call_exception is entered, reducing the influence of
user registers on speculation within the kernel.

Prior to this commit, orig_r3 was saved at the beginning of
system_call_exception. Instead, save orig_r3 while the user value is
still live in r3.

Also replicate this early save in 32-bit. A similar save was removed in
commit 6f76a01173cc ("powerpc/syscall: implement system call entry/exit
logic in C for PPC32") when 32-bit adopted system_call_exception. Revert
its removal of orig_r3 saves.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2 -> V3: New commit.
V4 -> V5: New commit message, as we do more than just revert 8875f47b7681.
---
 arch/powerpc/kernel/entry_32.S | 1 +
 arch/powerpc/kernel/interrupt_64.S | 2 ++
 arch/powerpc/kernel/syscall.c  | 1 -
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1d599df6f169..44dfce9a60c5 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -101,6 +101,7 @@ __kuep_unlock:
 
.globl  transfer_to_syscall
 transfer_to_syscall:
+   stw r3, ORIG_GPR3(r1)
stw r11, GPR1(r1)
stw r11, 0(r1)
mflrr12
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index ce25b28cf418..71d2d9497283 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -91,6 +91,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
li  r11,\trapnr
std r11,_TRAP(r1)
std r12,_CCR(r1)
+   std r3,ORIG_GPR3(r1)
addir10,r1,STACK_FRAME_OVERHEAD
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
@@ -275,6 +276,7 @@ END_BTB_FLUSH_SECTION
std r10,_LINK(r1)
std r11,_TRAP(r1)
std r12,_CCR(r1)
+   std r3,ORIG_GPR3(r1)
addir10,r1,STACK_FRAME_OVERHEAD
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index 81ace9e8b72b..64102a64fd84 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -25,7 +25,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
kuap_lock();
 
add_random_kstack_offset();
-   regs->orig_gpr3 = r3;
 
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
-- 
2.34.1



[PATCH 04/23] powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers

2022-09-15 Thread Rohan McLure
Use the convenience macros for saving/clearing/restoring gprs in keeping
with syscall calling conventions. The plural variants of these macros
can store a range of registers for concision.

This works well when the user gpr value we are hoping to save is still
live. In the syscall interrupt handlers, user register state is
sometimes juggled between registers. Hold-off from issuing the SAVE_GPR
macro for applicable neighbouring lines to highlight the delicate
register save logic.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V1 -> V2: Update summary
V2 -> V3: Update summary regarding exclusions for the SAVE_GPR marco.
Acknowledge new name for ZEROIZE_GPR{,S} macros.
V4 -> V5: Move to beginning of series
---
 arch/powerpc/kernel/interrupt_64.S | 43 ++--
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 71d2d9497283..7d92a7a54727 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -71,12 +71,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
mfcrr12
li  r11,0
/* Can we avoid saving r3-r8 in common case? */
-   std r3,GPR3(r1)
-   std r4,GPR4(r1)
-   std r5,GPR5(r1)
-   std r6,GPR6(r1)
-   std r7,GPR7(r1)
-   std r8,GPR8(r1)
+   SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
std r11,GPR10(r1)
@@ -149,17 +144,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* Could zero these as per ABI, but we may consider a stricter ABI
 * which preserves these if libc implementations can benefit, so
 * restore them for now until further measurement is done. */
-   ld  r0,GPR0(r1)
-   ld  r4,GPR4(r1)
-   ld  r5,GPR5(r1)
-   ld  r6,GPR6(r1)
-   ld  r7,GPR7(r1)
-   ld  r8,GPR8(r1)
+   REST_GPR(0, r1)
+   REST_GPRS(4, 8, r1)
/* Zero volatile regs that may contain sensitive kernel data */
-   li  r9,0
-   li  r10,0
-   li  r11,0
-   li  r12,0
+   ZEROIZE_GPRS(9, 12)
mtspr   SPRN_XER,r0
 
/*
@@ -182,7 +170,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r5,_XER(r1)
 
REST_NVGPRS(r1)
-   ld  r0,GPR0(r1)
+   REST_GPR(0, r1)
mtcrr2
mtctr   r3
mtlrr4
@@ -250,12 +238,7 @@ END_BTB_FLUSH_SECTION
mfcrr12
li  r11,0
/* Can we avoid saving r3-r8 in common case? */
-   std r3,GPR3(r1)
-   std r4,GPR4(r1)
-   std r5,GPR5(r1)
-   std r6,GPR6(r1)
-   std r7,GPR7(r1)
-   std r8,GPR8(r1)
+   SAVE_GPRS(3, 8, r1)
/* Zero r9-r12, this should only be required when restoring all GPRs */
std r11,GPR9(r1)
std r11,GPR10(r1)
@@ -345,16 +328,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
cmpdi   r3,0
bne .Lsyscall_restore_regs
/* Zero volatile regs that may contain sensitive kernel data */
-   li  r0,0
-   li  r4,0
-   li  r5,0
-   li  r6,0
-   li  r7,0
-   li  r8,0
-   li  r9,0
-   li  r10,0
-   li  r11,0
-   li  r12,0
+   ZEROIZE_GPR(0)
+   ZEROIZE_GPRS(4, 12)
mtctr   r0
mtspr   SPRN_XER,r0
 .Lsyscall_restore_regs_cont:
@@ -380,7 +355,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
REST_NVGPRS(r1)
mtctr   r3
mtspr   SPRN_XER,r4
-   ld  r0,GPR0(r1)
+   REST_GPR(0, r1)
REST_GPRS(4, 12, r1)
b   .Lsyscall_restore_regs_cont
 .Lsyscall_rst_end:
-- 
2.34.1



[PATCH 18/23] powerpc: Use common syscall handler type

2022-09-15 Thread Rohan McLure
Cause syscall handlers to be typed as follows when called indirectly
throughout the kernel. This is to allow for better type checking.

typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
   unsigned long, unsigned long, unsigned long);

Since both 32 and 64-bit abis allow for at least the first six
machine-word length parameters to a function to be passed by registers,
even handlers which admit fewer than six parameters may be viewed as
having the above type.

Coercing syscalls to syscall_fn requires a cast to void* to avoid
-Wcast-function-type.

Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce
explicit cast on systems with SPUs.

Signed-off-by: Rohan McLure 
---
V1 -> V2: New patch.
V2 -> V3: Remove unnecessary cast from const syscall_fn to syscall_fn
V4 -> V5: Update patch description.
---
 arch/powerpc/include/asm/syscall.h  | 7 +--
 arch/powerpc/include/asm/syscalls.h | 1 +
 arch/powerpc/kernel/systbl.c| 6 +++---
 arch/powerpc/kernel/vdso.c  | 4 ++--
 arch/powerpc/platforms/cell/spu_callbacks.c | 6 +++---
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index 25fc8ad9a27a..d2a8dfd5de33 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,9 +14,12 @@
 #include 
 #include 
 
+typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
+  unsigned long, unsigned long, unsigned long);
+
 /* ftrace syscalls requires exporting the sys_call_table */
-extern const unsigned long sys_call_table[];
-extern const unsigned long compat_sys_call_table[];
+extern const syscall_fn sys_call_table[];
+extern const syscall_fn compat_sys_call_table[];
 
 static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
 {
diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 5d106acf7906..cc87168d6ecb 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #endif
diff --git a/arch/powerpc/kernel/systbl.c b/arch/powerpc/kernel/systbl.c
index ce52bd2ec292..e5d419822b4e 100644
--- a/arch/powerpc/kernel/systbl.c
+++ b/arch/powerpc/kernel/systbl.c
@@ -16,9 +16,9 @@
 #include 
 
 #define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, entry)
-#define __SYSCALL(nr, entry) [nr] = (unsigned long) ,
+#define __SYSCALL(nr, entry) [nr] = (void *) entry,
 
-const unsigned long sys_call_table[] = {
+const syscall_fn sys_call_table[] = {
 #ifdef CONFIG_PPC64
 #include 
 #else
@@ -29,7 +29,7 @@ const unsigned long sys_call_table[] = {
 #ifdef CONFIG_COMPAT
 #undef __SYSCALL_WITH_COMPAT
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, compat)
-const unsigned long compat_sys_call_table[] = {
+const syscall_fn compat_sys_call_table[] = {
 #include 
 };
 #endif /* CONFIG_COMPAT */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index bf9574ec26ce..fcca06d200d3 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -304,10 +304,10 @@ static void __init vdso_setup_syscall_map(void)
unsigned int i;
 
for (i = 0; i < NR_syscalls; i++) {
-   if (sys_call_table[i] != (unsigned long)_ni_syscall)
+   if (sys_call_table[i] != (void *)_ni_syscall)
vdso_data->syscall_map[i >> 5] |= 0x8000UL >> (i & 
0x1f);
if (IS_ENABLED(CONFIG_COMPAT) &&
-   compat_sys_call_table[i] != (unsigned long)_ni_syscall)
+   compat_sys_call_table[i] != (void *)_ni_syscall)
vdso_data->compat_syscall_map[i >> 5] |= 0x8000UL 
>> (i & 0x1f);
}
 }
diff --git a/arch/powerpc/platforms/cell/spu_callbacks.c 
b/arch/powerpc/platforms/cell/spu_callbacks.c
index fe0d8797a00a..e780c14c5733 100644
--- a/arch/powerpc/platforms/cell/spu_callbacks.c
+++ b/arch/powerpc/platforms/cell/spu_callbacks.c
@@ -34,15 +34,15 @@
  * mbind, mq_open, ipc, ...
  */
 
-static void *spu_syscall_table[] = {
+static const syscall_fn spu_syscall_table[] = {
 #define __SYSCALL_WITH_COMPAT(nr, entry, compat) __SYSCALL(nr, entry)
-#define __SYSCALL(nr, entry) [nr] = entry,
+#define __SYSCALL(nr, entry) [nr] = (void *) entry,
 #include 
 };
 
 long spu_sys_callback(struct spu_syscall_block *s)
 {
-   long (*syscall)(u64 a1, u64 a2, u64 a3, u64 a4, u64 a5, u64 a6);
+   syscall_fn syscall;
 
if (s->nr_ret >= ARRAY_SIZE(spu_syscall_table)) {
pr_debug("%s: invalid syscall #%lld", __func__, s->nr_ret);
-- 
2.34.1



[PATCH 19/23] powerpc: Provide syscall wrapper

2022-09-15 Thread Rohan McLure
Implement syscall wrapper as per s390, x86, arm64. When enabled
cause handlers to accept parameters from a stack frame rather than
from user scratch register state. This allows for user registers to be
safely cleared in order to reduce caller influence on speculation
within syscall routine. The wrapper is a macro that emits syscall
handler symbols that call into the target handler, obtaining its
parameters from a struct pt_regs on the stack.

As registers are already saved to the stack prior to calling
system_call_exception, it appears that this function is executed more
efficiently with the new stack-pointer convention than with parameters
passed by registers, avoiding the allocation of a stack frame for this
method. On a 32-bit system, we see >20% performance increases on the
null_syscall microbenchmark, and on a Power 8 the performance gains
amortise the cost of clearing and restoring registers which is
implemented at the end of this series, seeing final result of ~5.6%
performance improvement on null_syscall.

Syscalls are wrapped in this fashion on all platforms except for the
Cell processor as this commit does not provide SPU support. This can be
quickly fixed in a successive patch, but requires spu_sys_callback to
allocate a pt_regs structure to satisfy the wrapped calling convention.

Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Rohan McLure 
---
V1 -> V2: Generate prototypes for symbols produced by the wrapper.
V2 -> V3: Rebased to remove conflict with 1547db7d1f44
("powerpc: Move system_call_exception() to syscall.c"). Also remove copy
from gpr3 save slot on stackframe to orig_r3's slot. Fix whitespace with
preprocessor defines in system_call_exception.
V4 -> V5: Move systbl.c syscall wrapper support to this patch. Swap
calling convention for system_call_exception to be (, r0)
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/interrupt.h   |  3 +-
 arch/powerpc/include/asm/syscall.h |  4 +
 arch/powerpc/include/asm/syscall_wrapper.h | 84 
 arch/powerpc/include/asm/syscalls.h| 30 ++-
 arch/powerpc/kernel/entry_32.S |  6 +-
 arch/powerpc/kernel/interrupt_64.S | 28 +--
 arch/powerpc/kernel/syscall.c  | 31 +++-
 arch/powerpc/kernel/systbl.c   |  8 ++
 arch/powerpc/kernel/vdso.c |  2 +
 10 files changed, 164 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..ef6c83e79c9b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC_BOOK3S || PPC_8xx || 
40x) && !HIBERNATION
select ARCH_HAS_STRICT_KERNEL_RWX   if FSL_BOOKE && !HIBERNATION && 
!RANDOMIZE_BASE
select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX
+   select ARCH_HAS_SYSCALL_WRAPPER if !SPU_BASE
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8069dbc4b8d1..48eec9cd1429 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -665,8 +665,7 @@ static inline void interrupt_cond_local_irq_enable(struct 
pt_regs *regs)
local_irq_enable();
 }
 
-long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8,
-  unsigned long r0, struct pt_regs *regs);
+long system_call_exception(struct pt_regs *regs, unsigned long r0);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs 
*regs, long scv);
 notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs);
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index d2a8dfd5de33..3dd36c5e334a 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,8 +14,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+typedef long (*syscall_fn)(const struct pt_regs *);
+#else
 typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
   unsigned long, unsigned long, unsigned long);
+#endif
 
 /* ftrace syscalls requires exporting the sys_call_table */
 extern const syscall_fn sys_call_table[];
diff --git a/arch/powerpc/include/asm/syscall_wrapper.h 
b/arch/powerpc/include/asm/syscall_wrapper.h
new file mode 100644
index ..91bcfa40f740
--- /dev/null
+++ b/arch/powerpc/include/asm/syscall_wrapper.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_wrapper.h - powerpc specific wrappers to syscall definitions
+ *
+ * Based on arch/{x86,arm64}/include/asm/syscall_wrapper.h
+ 

[PATCH 09/23] asm-generic: compat: Support BE for long long args in 32-bit ABIs

2022-09-15 Thread Rohan McLure
32-bit ABIs support passing 64-bit integers by registers via argument
translation. Commit 59c10c52f573 ("riscv: compat: syscall: Add
compat_sys_call_table implementation") implements the compat_arg_u64
macro for efficiently defining little endian compatibility syscalls.

Architectures supporting big endianness may benefit from reciprocal
argument translation, but are welcome also to implement their own.

Signed-off-by: Rohan McLure 
---
V4 -> V5: New patch.
---
 include/asm-generic/compat.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/compat.h b/include/asm-generic/compat.h
index d06308a2a7a8..aeb257ad3d1a 100644
--- a/include/asm-generic/compat.h
+++ b/include/asm-generic/compat.h
@@ -14,12 +14,17 @@
 #define COMPAT_OFF_T_MAX   0x7fff
 #endif
 
-#if !defined(compat_arg_u64) && !defined(CONFIG_CPU_BIG_ENDIAN)
+#ifndef compat_arg_u64
+#ifdef CONFIG_CPU_BIG_ENDIAN
 #define compat_arg_u64(name)   u32  name##_lo, u32  name##_hi
 #define compat_arg_u64_dual(name)  u32, name##_lo, u32, name##_hi
+#else
+#define compat_arg_u64(name)   u32  name##_hi, u32  name##_lo
+#define compat_arg_u64_dual(name)  u32, name##_hi, u32, name##_lo
+#endif
 #define compat_arg_u64_glue(name)  (((u64)name##_lo & 0xUL) | \
 ((u64)name##_hi << 32))
-#endif
+#endif /* compat_arg_u64 */
 
 /* These types are common across all compat ABIs */
 typedef u32 compat_size_t;
-- 
2.34.1



[PATCH 14/23] powerpc: Provide do_ppc64_personality helper

2022-09-15 Thread Rohan McLure
Avoid duplication in future patch that will define the ppc64_personality
syscall handler in terms of the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE
macros, by extracting the common body of ppc64_personality into a helper
function.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V2 -> V3: New commit.
V4 -> V5: Remove 'inline'.
---
 arch/powerpc/kernel/syscalls.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 9830957498b0..135a0b9108d5 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -75,7 +75,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
 }
 
 #ifdef CONFIG_PPC64
-long ppc64_personality(unsigned long personality)
+static long do_ppc64_personality(unsigned long personality)
 {
long ret;
 
@@ -87,6 +87,10 @@ long ppc64_personality(unsigned long personality)
ret = (ret & ~PER_MASK) | PER_LINUX;
return ret;
 }
+long ppc64_personality(unsigned long personality)
+{
+   return do_ppc64_personality(personality);
+}
 #endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
-- 
2.34.1



[PATCH 06/23] powerpc/64e: Clarify register saves and clears with {SAVE,ZEROIZE}_GPRS

2022-09-15 Thread Rohan McLure
The common interrupt handler prologue macro and the bad_stack
trampolines include consecutive sequences of register saves, and some
register clears. Neaten such instances by expanding use of the SAVE_GPRS
macro and employing the ZEROIZE_GPR macro when appropriate.

Also simplify an invocation of SAVE_GPRS targetting all non-volatile
registers to SAVE_NVGPRS.

Signed-off-by: Rohan Mclure 
Reported-by: Nicholas Piggin 
---
V3 -> V4: New commit.
---
 arch/powerpc/kernel/exceptions-64e.S | 27 +++---
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 67dc4e3179a0..48c640ca425d 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -216,17 +216,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
mtlrr10
mtcrr11
 
-   ld  r10,GPR10(r1)
-   ld  r11,GPR11(r1)
-   ld  r12,GPR12(r1)
+   REST_GPRS(10, 12, r1)
mtspr   \scratch,r0
 
std r10,\paca_ex+EX_R10(r13);
std r11,\paca_ex+EX_R11(r13);
ld  r10,_NIP(r1)
ld  r11,_MSR(r1)
-   ld  r0,GPR0(r1)
-   ld  r1,GPR1(r1)
+   REST_GPRS(0, 1, r1)
mtspr   \srr0,r10
mtspr   \srr1,r11
ld  r10,\paca_ex+EX_R10(r13)
@@ -372,16 +369,15 @@ ret_from_mc_except:
 /* Core exception code for all exceptions except TLB misses. */
 #define EXCEPTION_COMMON_LVL(n, scratch, excf) \
 exc_##n##_common:  \
-   std r0,GPR0(r1);/* save r0 in stackframe */ \
-   std r2,GPR2(r1);/* save r2 in stackframe */ \
-   SAVE_GPRS(3, 9, r1);/* save r3 - r9 in stackframe */\
+   SAVE_GPR(0, r1);/* save r0 in stackframe */ \
+   SAVE_GPRS(2, 9, r1);/* save r2 - r9 in stackframe */\
std r10,_NIP(r1);   /* save SRR0 to stackframe */   \
std r11,_MSR(r1);   /* save SRR1 to stackframe */   \
beq 2f; /* if from kernel mode */   \
 2: ld  r3,excf+EX_R10(r13);/* get back r10 */  \
ld  r4,excf+EX_R11(r13);/* get back r11 */  \
mfspr   r5,scratch; /* get back r13 */  \
-   std r12,GPR12(r1);  /* save r12 in stackframe */\
+   SAVE_GPR(12, r1);   /* save r12 in stackframe */\
ld  r2,PACATOC(r13);/* get kernel TOC into r2 */\
mflrr6; /* save LR in stackframe */ \
mfctr   r7; /* save CTR in stackframe */\
@@ -390,7 +386,7 @@ exc_##n##_common:   
\
lwz r10,excf+EX_CR(r13);/* load orig CR back from PACA  */  \
lbz r11,PACAIRQSOFTMASK(r13); /* get current IRQ softe */   \
ld  r12,exception_marker@toc(r2);   \
-   li  r0,0;   \
+   ZEROIZE_GPR(0); \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
std r5,GPR13(r1);   /* save it to stackframe */ \
@@ -1056,15 +1052,14 @@ bad_stack_book3e:
mfspr   r11,SPRN_ESR
std r10,_DEAR(r1)
std r11,_ESR(r1)
-   std r0,GPR0(r1);/* save r0 in stackframe */ \
-   std r2,GPR2(r1);/* save r2 in stackframe */ \
-   SAVE_GPRS(3, 9, r1);/* save r3 - r9 in stackframe */\
+   SAVE_GPR(0, r1);/* save r0 in stackframe */ \
+   SAVE_GPRS(2, 9, r1);/* save r2 - r9 in stackframe */\
ld  r3,PACA_EXGEN+EX_R10(r13);/* get back r10 */\
ld  r4,PACA_EXGEN+EX_R11(r13);/* get back r11 */\
mfspr   r5,SPRN_SPRG_GEN_SCRATCH;/* get back r13 XXX can be wrong */ \
std r3,GPR10(r1);   /* save r10 to stackframe */\
std r4,GPR11(r1);   /* save r11 to stackframe */\
-   std r12,GPR12(r1);  /* save r12 in stackframe */\
+   SAVE_GPR(12, r1);   /* save r12 in stackframe */\
std r5,GPR13(r1);   /* save it to stackframe */ \
mflrr10
mfctr   r11
@@ -1072,12 +1067,12 @@ bad_stack_book3e:
std r10,_LINK(r1)
std r11,_CTR(r1)
std r12,_XER(r1)
-   SAVE_GPRS(14, 31, r1)
+   SAVE_NVGPRS(r1)
lhz r12,PACA_TRAP_SAVE(r13)

[PATCH 13/23] powerpc: Remove direct call to mmap2 syscall handlers

2022-09-15 Thread Rohan McLure
Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Move the compatibility syscall definition for mmap2 to
syscalls.c, so that all mmap implementations can share a helper function.

Remove 'inline' on static mmap helper.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V1 -> V2: Move mmap2 compat implementation to asm/kernel/syscalls.c.
V3 -> V4: Move to be applied before syscall wrapper introduced.
V4 -> V5: Remove 'inline' in helper.
---
 arch/powerpc/kernel/sys_ppc32.c |  9 -
 arch/powerpc/kernel/syscalls.c  | 17 ++---
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index d961634976d8..776ae7565fc5 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -48,14 +47,6 @@
 #include 
 #include 
 
-unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
- unsigned long prot, unsigned long flags,
- unsigned long fd, unsigned long pgoff)
-{
-   /* This should remain 12 even if PAGE_SIZE changes */
-   return sys_mmap(addr, len, prot, flags, fd, pgoff << 12);
-}
-
 compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
 u32 reg6, u32 pos1, u32 pos2)
 {
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index a04c97faa21a..9830957498b0 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -36,9 +36,9 @@
 #include 
 #include 
 
-static inline long do_mmap2(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, unsigned long off, int shift)
+static long do_mmap2(unsigned long addr, size_t len,
+unsigned long prot, unsigned long flags,
+unsigned long fd, unsigned long off, int shift)
 {
if (!arch_validate_prot(prot, addr))
return -EINVAL;
@@ -56,6 +56,17 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
return do_mmap2(addr, len, prot, flags, fd, pgoff, PAGE_SHIFT-12);
 }
 
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE6(mmap2,
+  unsigned long, addr, size_t, len,
+  unsigned long, prot, unsigned long, flags,
+  unsigned long, fd, unsigned long, pgoff)
+{
+   /* This should remain 12 even if PAGE_SIZE changes */
+   return do_mmap2(addr, len, prot, flags, fd, pgoff << 12, PAGE_SHIFT-12);
+}
+#endif
+
 SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
-- 
2.34.1



[PATCH 01/23] powerpc: Remove asmlinkage from syscall handler definitions

2022-09-15 Thread Rohan McLure
The asmlinkage macro has no special meaning in powerpc, and prior to
this patch is used sporadically on some syscall handler definitions. On
architectures that do not define asmlinkage, it resolves to extern "C"
for C++ compilers and a nop otherwise. The current invocations of
asmlinkage provide far from complete support for C++ toolchains, and so
the macro serves no purpose in powerpc.

Remove all invocations of asmlinkage in arch/powerpc. These incidentally
only occur in syscall definitions and prototypes.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Andrew Donnellan 
---
V2 -> V3: new patch
---
 arch/powerpc/include/asm/syscalls.h | 16 
 arch/powerpc/kernel/sys_ppc32.c |  8 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index a2b13e55254f..21c2faaa2957 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -10,14 +10,14 @@
 
 struct rtas_args;
 
-asmlinkage long sys_mmap(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, off_t offset);
-asmlinkage long sys_mmap2(unsigned long addr, size_t len,
-   unsigned long prot, unsigned long flags,
-   unsigned long fd, unsigned long pgoff);
-asmlinkage long ppc64_personality(unsigned long personality);
-asmlinkage long sys_rtas(struct rtas_args __user *uargs);
+long sys_mmap(unsigned long addr, size_t len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, off_t offset);
+long sys_mmap2(unsigned long addr, size_t len,
+  unsigned long prot, unsigned long flags,
+  unsigned long fd, unsigned long pgoff);
+long ppc64_personality(unsigned long personality);
+long sys_rtas(struct rtas_args __user *uargs);
 int ppc_select(int n, fd_set __user *inp, fd_set __user *outp,
   fd_set __user *exp, struct __kernel_old_timeval __user *tvp);
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 16ff0399a257..f4edcc9489fb 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -85,20 +85,20 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 
offset1, u32 offset2, u3
return ksys_readahead(fd, merge_64(offset1, offset2), count);
 }
 
-asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4,
+int compat_sys_truncate64(const char __user * path, u32 reg4,
unsigned long len1, unsigned long len2)
 {
return ksys_truncate(path, merge_64(len1, len2));
 }
 
-asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 
offset2,
+long compat_sys_fallocate(int fd, int mode, u32 offset1, u32 offset2,
 u32 len1, u32 len2)
 {
return ksys_fallocate(fd, mode, ((loff_t)offset1 << 32) | offset2,
 merge_64(len1, len2));
 }
 
-asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
len1,
+int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
 unsigned long len2)
 {
return ksys_ftruncate(fd, merge_64(len1, len2));
@@ -111,7 +111,7 @@ long ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 
offset2,
 advice);
 }
 
-asmlinkage long compat_sys_sync_file_range2(int fd, unsigned int flags,
+long compat_sys_sync_file_range2(int fd, unsigned int flags,
   unsigned offset1, unsigned offset2,
   unsigned nbytes1, unsigned nbytes2)
 {
-- 
2.34.1



[PATCH 03/23] powerpc: Add ZEROIZE_GPRS macros for register clears

2022-09-15 Thread Rohan McLure
Provide register zeroing macros, following the same convention as
existing register stack save/restore macros, to be used in later
change to concisely zero a sequence of consecutive gprs.

The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping
with the naming of the accompanying restore and save macros, and usage
of zeroize to describe this operation elsewhere in the kernel.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
V1 -> V2: Change 'ZERO' usage in naming to 'NULLIFY', a more obvious verb
V2 -> V3: Change 'NULLIFY' usage in naming to 'ZEROIZE', which has
precedent in kernel and explicitly specifies that we are zeroing.
V3 -> V4: Update commit message to use zeroize.
V4 -> V5: The reason for the patch is to add zeroize macros. Move that
to first paragraph in patch description.
---
 arch/powerpc/include/asm/ppc_asm.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 83c02f5a7f2a..b95689ada59c 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -33,6 +33,20 @@
.endr
 .endm
 
+/*
+ * This expands to a sequence of register clears for regs start to end
+ * inclusive, of the form:
+ *
+ *   li rN, 0
+ */
+.macro ZEROIZE_REGS start, end
+   .Lreg=\start
+   .rept (\end - \start + 1)
+   li  .Lreg, 0
+   .Lreg=.Lreg+1
+   .endr
+.endm
+
 /*
  * Macros for storing registers into and loading registers from
  * exception frames.
@@ -49,6 +63,14 @@
 #define REST_NVGPRS(base)  REST_GPRS(13, 31, base)
 #endif
 
+#defineZEROIZE_GPRS(start, end)ZEROIZE_REGS start, end
+#ifdef __powerpc64__
+#defineZEROIZE_NVGPRS()ZEROIZE_GPRS(14, 31)
+#else
+#defineZEROIZE_NVGPRS()ZEROIZE_GPRS(13, 31)
+#endif
+#defineZEROIZE_GPR(n)  ZEROIZE_GPRS(n, n)
+
 #define SAVE_GPR(n, base)  SAVE_GPRS(n, n, base)
 #define REST_GPR(n, base)  REST_GPRS(n, n, base)
 
-- 
2.34.1



[PATCH 16/23] powerpc: Include all arch-specific syscall prototypes

2022-09-15 Thread Rohan McLure
Forward declare all syscall handler prototypes where a generic prototype
is not provided in either linux/syscalls.h or linux/compat.h in
asm/syscalls.h. This is required for compile-time type-checking for
syscall handlers, which is implemented later in this series.

32-bit compatibility syscall handlers are expressed in terms of types in
ppc32.h. Expose this header globally.

Signed-off-by: Rohan McLure 
---
V1 -> V2: Explicitly include prototypes.
V2 -> V3: Remove extraneous #include  and ppc_fallocate
prototype. Rename header.
V4 -> V5: Clean. Elaborate comment on long long munging. Remove
prototype hiding conditional on SYSCALL_WRAPPER.
---
 arch/powerpc/include/asm/syscalls.h  | 97 ++
 .../ppc32.h => include/asm/syscalls_32.h}|  0
 arch/powerpc/kernel/signal_32.c  |  2 +-
 arch/powerpc/perf/callchain_32.c |  2 +-
 4 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 525d2aa0c8ca..5d106acf7906 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -8,6 +8,14 @@
 #include 
 #include 
 
+#ifdef CONFIG_PPC64
+#include 
+#endif
+#include 
+#include 
+
+struct rtas_args;
+
 /*
  * long long munging:
  * The 32 bit ABI passes long longs in an odd even register pair.
@@ -20,44 +28,89 @@
 #define merge_64(high, low) ((u64)high << 32) | low
 #endif
 
-struct rtas_args;
+long sys_ni_syscall(void);
+
+/*
+ * PowerPC architecture-specific syscalls
+ */
+
+long sys_rtas(struct rtas_args __user *uargs);
+
+#ifdef CONFIG_PPC64
+long sys_ppc64_personality(unsigned long personality);
+#ifdef CONFIG_COMPAT
+long compat_sys_ppc64_personality(unsigned long personality);
+#endif /* CONFIG_COMPAT */
+#endif /* CONFIG_PPC64 */
 
+long sys_swapcontext(struct ucontext __user *old_ctx,
+struct ucontext __user *new_ctx, long ctx_size);
 long sys_mmap(unsigned long addr, size_t len,
  unsigned long prot, unsigned long flags,
  unsigned long fd, off_t offset);
 long sys_mmap2(unsigned long addr, size_t len,
   unsigned long prot, unsigned long flags,
   unsigned long fd, unsigned long pgoff);
-long sys_ppc64_personality(unsigned long personality);
-long sys_rtas(struct rtas_args __user *uargs);
-long sys_ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
- u32 len_high, u32 len_low);
+long sys_switch_endian(void);
 
-#ifdef CONFIG_COMPAT
-unsigned long compat_sys_mmap2(unsigned long addr, size_t len,
-  unsigned long prot, unsigned long flags,
-  unsigned long fd, unsigned long pgoff);
-
-compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
- u32 reg6, u32 pos1, u32 pos2);
+#ifdef CONFIG_PPC32
+long sys_sigreturn(void);
+long sys_debug_setcontext(struct ucontext __user *ctx, int ndbg,
+ struct sig_dbg_op __user *dbg);
+#endif
 
-compat_ssize_t compat_sys_pwrite64(unsigned int fd, const char __user *ubuf, 
compat_size_t count,
-  u32 reg6, u32 pos1, u32 pos2);
+long sys_rt_sigreturn(void);
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offset1, u32 offset2, 
u32 count);
+long sys_subpage_prot(unsigned long addr,
+ unsigned long len, u32 __user *map);
 
-int compat_sys_truncate64(const char __user *path, u32 reg4,
- unsigned long len1, unsigned long len2);
+#ifdef CONFIG_COMPAT
+long compat_sys_swapcontext(struct ucontext32 __user *old_ctx,
+   struct ucontext32 __user *new_ctx,
+   int ctx_size);
+long compat_sys_old_getrlimit(unsigned int resource,
+ struct compat_rlimit __user *rlim);
+long compat_sys_sigreturn(void);
+long compat_sys_rt_sigreturn(void);
+#endif /* CONFIG_COMPAT */
 
-int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long len1,
-  unsigned long len2);
+/*
+ * Architecture specific signatures required by long long munging:
+ * The 32 bit ABI passes long longs in an odd even register pair.
+ * The following signatures provide a machine long parameter for
+ * each register that will be supplied. The implementation is
+ * responsible for combining parameter pairs.
+ */
 
+#ifdef CONFIG_COMPAT
+long compat_sys_mmap2(unsigned long addr, size_t len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, unsigned long pgoff);
+long compat_sys_ppc_pread64(unsigned int fd,
+   char __user *ubuf, compat_size_t count,
+   u32 reg6, u32 pos1, u32 pos2);
+long compat_sys_ppc_pwrite64(unsigned int fd,
+const char __user *ubuf, compat_size_t count,
+

[PATCH 23/23] powerpc/64e: Clear gprs on interrupt routine entry on Book3E

2022-09-15 Thread Rohan McLure
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Signed-off-by: Rohan McLure 
---
V3 -> V4: New patch.
V4 -> V5: Depend on Kconfig option. Remove ZEROIZE_NVGPRS on bad kernel
stack handler.
---
 arch/powerpc/kernel/exceptions-64e.S | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 48c640ca425d..61748769ea29 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -365,6 +365,11 @@ ret_from_mc_except:
std r14,PACA_EXMC+EX_R14(r13);  \
std r15,PACA_EXMC+EX_R15(r13)
 
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+#define SANITIZE_NVGPRSZEROIZE_NVGPRS()
+#else
+#define SANITIZE_NVGPRS
+#endif
 
 /* Core exception code for all exceptions except TLB misses. */
 #define EXCEPTION_COMMON_LVL(n, scratch, excf) \
@@ -401,7 +406,8 @@ exc_##n##_common:   
\
std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */   \
std r3,_TRAP(r1);   /* set trap number  */  \
std r0,RESULT(r1);  /* clear regs->result */\
-   SAVE_NVGPRS(r1);
+   SAVE_NVGPRS(r1);\
+   SANITIZE_NVGPRS;/* minimise speculation influence */
 
 #define EXCEPTION_COMMON(n) \
EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN)
-- 
2.34.1



[PATCH] powerpc: Save AMR/IAMR when switching tasks

2022-09-15 Thread Samuel Holland
With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible
to switch away from a task inside copy_{from,to}_user. This left the CPU
with userspace access enabled until after the next IRQ or privilege
level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when
switching back to the original task, the userspace access would fault:

  Kernel attempted to write user page (3fff7ab68190) - exploit attempt? (uid: 
65536)
  [ cut here ]
  Bug: Write fault blocked by KUAP!
  WARNING: CPU: 56 PID: 4939 at arch/powerpc/mm/fault.c:228 
___do_page_fault+0x7b4/0xaa0
  CPU: 56 PID: 4939 Comm: git Tainted: GW 
5.19.8-5-gba424747260d #1
  NIP:  c00555e4 LR: c00555e0 CTR: c079d9d0
  REGS: c0008f507370 TRAP: 0700   Tainted: GW  
(5.19.8-5-gba424747260d)
  MSR:  90021033   CR: 2804  XER: 2004
  CFAR: c0123780 IRQMASK: 3
  NIP [c00555e4] ___do_page_fault+0x7b4/0xaa0
  LR [c00555e0] ___do_page_fault+0x7b0/0xaa0
  Call Trace:
  [c0008f507610] [c00555e0] ___do_page_fault+0x7b0/0xaa0 
(unreliable)
  [c0008f5076c0] [c0055938] do_page_fault+0x68/0x130
  [c0008f5076f0] [c0008914] data_access_common_virt+0x194/0x1f0
  --- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4
  NIP:  c007b1a8 LR: c073f4d4 CTR: 0080
  REGS: c0008f507760 TRAP: 0300   Tainted: GW  
(5.19.8-5-gba424747260d)
  MSR:  9280b033   CR: 24002220  
XER: 2004
  CFAR: c007b174 DAR: 3fff7ab68190 DSISR: 0a00 IRQMASK: 0
  NIP [c007b1a8] __copy_tofrom_user_base+0x9c/0x5a4
  LR [c073f4d4] copyout+0x74/0x150
  --- interrupt: 300
  [c0008f507a30] [c07430cc] copy_page_to_iter+0x12c/0x4b0
  [c0008f507ab0] [c02c7c20] filemap_read+0x200/0x460
  [c0008f507bf0] [c05f96f4] xfs_file_buffered_read+0x104/0x170
  [c0008f507c30] [c05f9800] xfs_file_read_iter+0xa0/0x150
  [c0008f507c70] [c03bddc8] new_sync_read+0x108/0x180
  [c0008f507d10] [c03c06b0] vfs_read+0x1d0/0x240
  [c0008f507d60] [c03c0ba4] ksys_read+0x84/0x140
  [c0008f507db0] [c002a3fc] system_call_exception+0x15c/0x300
  [c0008f507e10] [c000c63c] system_call_common+0xec/0x250
  --- interrupt: c00 at 0x3fff83aa7238
  NIP:  3fff83aa7238 LR: 3fff83a923b8 CTR: 
  REGS: c0008f507e80 TRAP: 0c00   Tainted: GW  
(5.19.8-5-gba424747260d)
  MSR:  9280f033   CR: 80002482  
XER: 
  IRQMASK: 0
  NIP [3fff83aa7238] 0x3fff83aa7238
  LR [3fff83a923b8] 0x3fff83a923b8
  --- interrupt: c00
  Instruction dump:
  e87f0100 48101021 6000 2c23 4182fee8 408e0128 3c82ff80 3884e978
  3c62ff80 3863ea78 480ce13d 6000 <0fe0> fb010070 fb810090 e80100c0
  ---[ end trace  ]---

Fix this by saving and restoring the kernel-side AMR/IAMR values when
switching tasks.

Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: Samuel Holland 
---
I have no idea if this is the right change to make, and it could be
optimized, but my system has been stable with this patch for 5 days now.

Without the patch, I hit the bug every few minutes when my load average
is <1, and I hit it immediately if I try to do a parallel kernel build.

Because of the instability (file I/O randomly raises SIGBUS), I don't
think anyone would run a system in this configuration, so I don't think
this bug is exploitable.

 arch/powerpc/kernel/process.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 0fbda89cd1bb..69b189d63124 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1150,6 +1150,12 @@ static inline void save_sprs(struct thread_struct *t)
 */
t->tar = mfspr(SPRN_TAR);
}
+   if (t->regs) {
+   if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP))
+   t->regs->amr = mfspr(SPRN_AMR);
+   if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP))
+   t->regs->iamr = mfspr(SPRN_IAMR);
+   }
 #endif
 }
 
@@ -1228,6 +1234,13 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
if (cpu_has_feature(CPU_FTR_P9_TIDR) &&
old_thread->tidr != new_thread->tidr)
mtspr(SPRN_TIDR, new_thread->tidr);
+   if (new_thread->regs) {
+   if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP))
+   mtspr(SPRN_AMR, new_thread->regs->amr);
+   if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP))
+   mtspr(SPRN_IAMR, new_thread->regs->iamr);
+   isync();
+   }
 #endif
 
 }
-- 
2.35.1



Re: [PATCH kernel] KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent

2022-09-15 Thread Anup Patel
On Wed, May 4, 2022 at 1:18 PM Alexey Kardashevskiy  wrote:
>
> When introduced, IRQFD resampling worked on POWER8 with XICS. However
> KVM on POWER9 has never implemented it - the compatibility mode code
> ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
> XIVE mode does not handle INTx in KVM at all.
>
> This moved the capability support advertising to platforms and stops
> advertising it on XIVE, i.e. POWER9 and later.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
>
> Or I could move this one together with KVM_CAP_IRQFD. Thoughts?

For KVM RISC-V:
Acked-by: Anup Patel 

Thanks,
Anup

>
> ---
>  arch/arm64/kvm/arm.c   | 3 +++
>  arch/mips/kvm/mips.c   | 3 +++
>  arch/powerpc/kvm/powerpc.c | 6 ++
>  arch/riscv/kvm/vm.c| 3 +++
>  arch/s390/kvm/kvm-s390.c   | 3 +++
>  arch/x86/kvm/x86.c | 3 +++
>  virt/kvm/kvm_main.c| 1 -
>  7 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 523bc934fe2f..092f0614bae3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -210,6 +210,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_VCPU_ATTRIBUTES:
> case KVM_CAP_PTP_KVM:
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +#endif
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..0f3de470a73e 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1071,6 +1071,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> case KVM_CAP_READONLY_MEM:
> case KVM_CAP_SYNC_MMU:
> case KVM_CAP_IMMEDIATE_EXIT:
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +#endif
> r = 1;
> break;
> case KVM_CAP_NR_VCPUS:
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 875c30c12db0..87698ffef3be 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -591,6 +591,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> break;
>  #endif
>
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +   r = !xive_enabled();
> +   break;
> +#endif
> +
> case KVM_CAP_PPC_ALLOC_HTAB:
> r = hv_enabled;
> break;
> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> index c768f75279ef..b58579b386bb 100644
> --- a/arch/riscv/kvm/vm.c
> +++ b/arch/riscv/kvm/vm.c
> @@ -63,6 +63,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_READONLY_MEM:
> case KVM_CAP_MP_STATE:
> case KVM_CAP_IMMEDIATE_EXIT:
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +#endif
> r = 1;
> break;
> case KVM_CAP_NR_VCPUS:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 156d1c25a3c1..85e093fc8d13 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -564,6 +564,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_S390_DIAG318:
> case KVM_CAP_S390_MEM_OP_EXTENSION:
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +#endif
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0c0ca599a353..a0a7b769483d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4273,6 +4273,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> case KVM_CAP_SYS_ATTRIBUTES:
> case KVM_CAP_VAPIC:
> case KVM_CAP_ENABLE_CAP:
> +#ifdef CONFIG_HAVE_KVM_IRQFD
> +   case KVM_CAP_IRQFD_RESAMPLE:
> +#endif
> r = 1;
> break;
> case KVM_CAP_EXIT_HYPERCALL:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70e05af5ebea..885e72e668a5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4293,7 +4293,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct 
> kvm *kvm, long arg)
>  #endif
>  #ifdef CONFIG_HAVE_KVM_IRQFD
> case KVM_CAP_IRQFD:
> -   case KVM_CAP_IRQFD_RESAMPLE:
>  #endif
> case KVM_CAP_IOEVENTFD_ANY_LENGTH:
> case KVM_CAP_CHECK_EXTENSION_VM:
> --
> 2.30.2
>


[PATCH v2 7/7] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE

2022-09-15 Thread Nicholas Piggin
Const function pointers by convention live in .data.rel.ro if they need
to be relocated. Now that .data.rel.ro is linked into the read-only
region, put them in the right section. This doesn't make much practical
difference, but it will make the C conversion of sys_call_table a
smaller change as far as linking goes.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/systbl.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index cb3358886203..0bec33e86f50 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -12,7 +12,11 @@
 
 #include 
 
+#ifdef CONFIG_RELOCATABLE
+.section .data.rel.ro,"aw"
+#else
 .section .rodata,"a"
+#endif
 
 #ifdef CONFIG_PPC64
.p2align3
-- 
2.37.2



[PATCH v2 6/7] powerpc/64/build: merge .got and .toc input sections

2022-09-15 Thread Nicholas Piggin
Follow the binutils ld internal linker script and merge .got and .toc
input sections in the .got output section.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 737825ae2ae0..3d96d51c8a5f 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -169,13 +169,12 @@ SECTIONS
}
 
.got : AT(ADDR(.got) - LOAD_OFFSET) ALIGN(256) {
-   *(.got)
+   *(.got .toc)
 #ifndef CONFIG_RELOCATABLE
__prom_init_toc_start = .;
arch/powerpc/kernel/prom_init.o*(.toc)
__prom_init_toc_end = .;
 #endif
-   *(.toc)
}
 
SOFT_MASK_TABLE(8)
-- 
2.37.2



[PATCH v2 5/7] powerpc/64/build: only include .opd with ELFv1

2022-09-15 Thread Nicholas Piggin
ELFv2 does not use function descriptors so .opd is not required.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index ae0814063900..737825ae2ae0 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -181,11 +181,13 @@ SECTIONS
SOFT_MASK_TABLE(8)
RESTART_TABLE(8)
 
+#ifdef CONFIG_PPC64_ELF_ABI_V1
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
__start_opd = .;
KEEP(*(.opd))
__end_opd = .;
}
+#endif
 
. = ALIGN(8);
__stf_entry_barrier_fixup : AT(ADDR(__stf_entry_barrier_fixup) - 
LOAD_OFFSET) {
-- 
2.37.2



[PATCH v2 4/7] powerpc/build: move .data.rel.ro, .sdata2 to read-only

2022-09-15 Thread Nicholas Piggin
.sdata2 is a readonly small data section for ppc32, and .data.rel.ro
is data that needs relocating but is read-only after that so these
can both be moved to the read only memory region.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 16c4389d498d..ae0814063900 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -131,6 +131,16 @@ SECTIONS
/* Read-only data */
RO_DATA(PAGE_SIZE)
 
+#ifdef CONFIG_PPC32
+   .sdata2 : AT(ADDR(.sdata2) - LOAD_OFFSET) {
+   *(.sdata2)
+   }
+#endif
+
+   .data.rel.ro : AT(ADDR(.data.rel.ro) - LOAD_OFFSET) {
+   *(.data.rel.ro*)
+   }
+
.branch_lt : AT(ADDR(.branch_lt) - LOAD_OFFSET) {
*(.branch_lt)
}
@@ -348,19 +358,13 @@ SECTIONS
. = ALIGN(PAGE_SIZE);
_sdata = .;
 
-#ifdef CONFIG_PPC32
.data : AT(ADDR(.data) - LOAD_OFFSET) {
DATA_DATA
*(.data.rel*)
+#ifdef CONFIG_PPC32
*(SDATA_MAIN)
-   *(.sdata2)
-   }
-#else
-   .data : AT(ADDR(.data) - LOAD_OFFSET) {
-   DATA_DATA
-   *(.data.rel*)
-   }
 #endif
+   }
 
/* The initial task and kernel stack */
INIT_TASK_DATA_SECTION(THREAD_ALIGN)
-- 
2.37.2



[PATCH v2 3/7] powerpc/build: move got, toc, plt, branch_lt sections to read-only

2022-09-15 Thread Nicholas Piggin
This moves linker-related tables from .data to read-only area.
Relocations are performed at early boot time before memory is protected,
after which there should be no modifications required.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 42 ---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 607b17b1e785..16c4389d498d 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -131,6 +131,10 @@ SECTIONS
/* Read-only data */
RO_DATA(PAGE_SIZE)
 
+   .branch_lt : AT(ADDR(.branch_lt) - LOAD_OFFSET) {
+   *(.branch_lt)
+   }
+
 #ifdef CONFIG_PPC32
.got1 : AT(ADDR(.got1) - LOAD_OFFSET) {
*(.got1)
@@ -140,7 +144,30 @@ SECTIONS
*(.got2)
__got2_end = .;
}
+   .got : AT(ADDR(.got) - LOAD_OFFSET) SPECIAL {
+   *(.got)
+   *(.got.plt)
+   }
+   .plt : AT(ADDR(.plt) - LOAD_OFFSET) SPECIAL {
+   /* XXX: is .plt (and .got.plt) required? */
+   *(.plt)
+   }
+
 #else /* CONFIG_PPC32 */
+   .toc1 : AT(ADDR(.toc1) - LOAD_OFFSET) {
+   *(.toc1)
+   }
+
+   .got : AT(ADDR(.got) - LOAD_OFFSET) ALIGN(256) {
+   *(.got)
+#ifndef CONFIG_RELOCATABLE
+   __prom_init_toc_start = .;
+   arch/powerpc/kernel/prom_init.o*(.toc)
+   __prom_init_toc_end = .;
+#endif
+   *(.toc)
+   }
+
SOFT_MASK_TABLE(8)
RESTART_TABLE(8)
 
@@ -327,26 +354,11 @@ SECTIONS
*(.data.rel*)
*(SDATA_MAIN)
*(.sdata2)
-   *(.got.plt) *(.got)
-   *(.plt)
-   *(.branch_lt)
}
 #else
.data : AT(ADDR(.data) - LOAD_OFFSET) {
DATA_DATA
*(.data.rel*)
-   *(.toc1)
-   *(.branch_lt)
-   }
-
-   .got : AT(ADDR(.got) - LOAD_OFFSET) ALIGN(256) {
-   *(.got)
-#ifndef CONFIG_RELOCATABLE
-   __prom_init_toc_start = .;
-   arch/powerpc/kernel/prom_init.o*(.toc)
-   __prom_init_toc_end = .;
-#endif
-   *(.toc)
}
 #endif
 
-- 
2.37.2



[PATCH v2 2/7] powerpc/32/build: move got1/got2 sections out of text

2022-09-15 Thread Nicholas Piggin
Following the example from the binutils default linker script, move
.got1 and .got2 out of .text, to just after RO_DATA.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index d81e4392da26..607b17b1e785 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -122,14 +122,6 @@ SECTIONS
*(.sfpr);
MEM_KEEP(init.text)
MEM_KEEP(exit.text)
-
-#ifdef CONFIG_PPC32
-   *(.got1)
-   __got2_start = .;
-   *(.got2)
-   __got2_end = .;
-#endif /* CONFIG_PPC32 */
-
} :text
 
. = ALIGN(PAGE_SIZE);
@@ -139,7 +131,16 @@ SECTIONS
/* Read-only data */
RO_DATA(PAGE_SIZE)
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC32
+   .got1 : AT(ADDR(.got1) - LOAD_OFFSET) {
+   *(.got1)
+   }
+   .got2 : AT(ADDR(.got2) - LOAD_OFFSET) {
+   __got2_start = .;
+   *(.got2)
+   __got2_end = .;
+   }
+#else /* CONFIG_PPC32 */
SOFT_MASK_TABLE(8)
RESTART_TABLE(8)
 
@@ -190,7 +191,7 @@ SECTIONS
*(__rfi_flush_fixup)
__stop___rfi_flush_fixup = .;
}
-#endif /* CONFIG_PPC64 */
+#endif /* CONFIG_PPC32 */
 
 #ifdef CONFIG_PPC_BARRIER_NOSPEC
. = ALIGN(8);
-- 
2.37.2



[PATCH v2 1/7] powerpc: move __end_rodata to cover arch read-only sections

2022-09-15 Thread Nicholas Piggin
powerpc has a number of read-only sections and tables that are put after
RO_DATA(). Move the __end_rodata symbol to cover these as well.

Setting memory to read-only at boot is done using __init_begin, change
that that to use __end_rodata. This removes the requirement for the init
section to follow read-only data.

This makes is_kernel_rodata() exactly cover the read-only region, as
well as other things using __end_rodata (e.g., kernel/dma/debug.c).
Boot dmesg also prints the rodata size more accurately.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vmlinux.lds.S| 3 +++
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 6 +++---
 arch/powerpc/mm/pgtable_32.c | 7 ---
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index fe22d940412f..d81e4392da26 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -210,6 +210,9 @@ SECTIONS
}
 #endif
 
+   . = ALIGN(STRICT_ALIGN_SIZE);
+   __end_rodata = .;
+
 /*
  * Init sections discarded at runtime
  */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index a96b73006dfb..e13b883e4e5b 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -240,7 +240,7 @@ void mmu_mark_rodata_ro(void)
for (i = 0; i < nb; i++) {
struct ppc_bat *bat = BATS[i];
 
-   if (bat_addrs[i].start < (unsigned long)__init_begin)
+   if (bat_addrs[i].start < (unsigned long)__end_rodata)
bat[1].batl = (bat[1].batl & ~BPP_RW) | BPP_RX;
}
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index ae008b9df0e6..28332001bd87 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -541,7 +541,7 @@ void hash__mark_rodata_ro(void)
unsigned long start, end, pp;
 
start = (unsigned long)_stext;
-   end = (unsigned long)__init_begin;
+   end = (unsigned long)__end_rodata;
 
pp = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL_ROX), 
HPTE_USE_KERNEL_KEY);
 
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 698274109c91..eb3c56975c37 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -228,7 +228,7 @@ void radix__mark_rodata_ro(void)
unsigned long start, end;
 
start = (unsigned long)_stext;
-   end = (unsigned long)__init_begin;
+   end = (unsigned long)__end_rodata;
 
radix__change_memory_range(start, end, _PAGE_WRITE);
 }
@@ -259,8 +259,8 @@ print_mapping(unsigned long start, unsigned long end, 
unsigned long size, bool e
 static unsigned long next_boundary(unsigned long addr, unsigned long end)
 {
 #ifdef CONFIG_STRICT_KERNEL_RWX
-   if (addr < __pa_symbol(__init_begin))
-   return __pa_symbol(__init_begin);
+   if (addr < __pa_symbol(__end_rodata))
+   return __pa_symbol(__end_rodata);
 #endif
return end;
 }
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 3ac73f9fb5d5..5c02fd08d61e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -158,10 +158,11 @@ void mark_rodata_ro(void)
}
 
/*
-* mark .text and .rodata as read only. Use __init_begin rather than
-* __end_rodata to cover NOTES and EXCEPTION_TABLE.
+* mark text and rodata as read only. __end_rodata is set by
+* powerpc's linker script and includes tables and data
+* requiring relocation which are not put in RO_DATA.
 */
-   numpages = PFN_UP((unsigned long)__init_begin) -
+   numpages = PFN_UP((unsigned long)__end_rodata) -
   PFN_DOWN((unsigned long)_stext);
 
set_memory_ro((unsigned long)_stext, numpages);
-- 
2.37.2



[PATCH v2 0/7] powerpc: build / linker improvements

2022-09-15 Thread Nicholas Piggin
This series is mainly about moving more things out of writable and
executable memory, and slightly moving the linker script in the
direction of the binutils ld internal linker script as we do.

Thanks,
Nick

Since v1:
- Move sys_call_table data.rel.ro patch to the end.
- And fix the changelog in that patch, the relocations were a red herring.
- Update the changelog in the __end_rodata patch.
- Align __end_rodata to strict align size.

Nicholas Piggin (7):
  powerpc: move __end_rodata to cover arch read-only sections
  powerpc/32/build: move got1/got2 sections out of text
  powerpc/build: move got, toc, plt, branch_lt sections to read-only
  powerpc/build: move .data.rel.ro, .sdata2 to read-only
  powerpc/64/build: only include .opd with ELFv1
  powerpc/64/build: merge .got and .toc input sections
  powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE

 arch/powerpc/kernel/systbl.S |  4 ++
 arch/powerpc/kernel/vmlinux.lds.S| 85 +++-
 arch/powerpc/mm/book3s32/mmu.c   |  2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c |  6 +-
 arch/powerpc/mm/pgtable_32.c |  7 +-
 6 files changed, 66 insertions(+), 40 deletions(-)

-- 
2.37.2



Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2022-09-15 Thread Nicholas Piggin
On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> > Leonardo Brás  writes:
> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> > > > Leonardo Brás  writes:
> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > > > > At the time this was submitted by Leonardo, I confirmed -- or 
> > > > > > thought
> > > > > > I had confirmed -- with PowerVM partition firmware development that
> > > > > > the following RTAS functions:
> > > > > > 
> > > > > > - ibm,get-xive
> > > > > > - ibm,int-off
> > > > > > - ibm,int-on
> > > > > > - ibm,set-xive
> > > > > > 
> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > > > > RTAS calls:
> > > > > > 
> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o@linux.ibm.com/
> > > > > > 
> > > > > > Recent discussion with firmware development makes it clear that this
> > > > > > is not true, and that the code in commit b664db8e3f97 
> > > > > > ("powerpc/rtas:
> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal 
> > > > > > state
> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > > > > leading to symptoms like endless busy statuses from RTAS.
> > > > > 
> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> > > > 
> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> > > > property described in the spec applies only to the individual RTAS
> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> > > > simultaneously, but it must adhere to the more general requirement to
> > > > serialize with other RTAS functions.
> > > > 
> > > 
> > > I see. Thanks for explaining that part!
> > > I agree: reentrant calls that way don't look as useful on Linux than I
> > > previously thought.
> > > 
> > > OTOH, I think that instead of reverting the change, we could make use of 
> > > the
> > > correct information and fix the current implementation. (This could help 
> > > when we
> > > do the same rtas call in multiple cpus)
> > 
> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> > that. I'm not seeing the need.
> > 
> > > I have an idea of a patch to fix this. 
> > > Do you think it would be ok if I sent that, to prospect being an 
> > > alternative to
> > > this reversion?
> > 
> > It is my preference, and I believe it is more common, to revert to the
> > well-understood prior state, imperfect as it may be. The revert can be
> > backported to -stable and distros while development and review of
> > another approach proceeds.
>
> Ok then, as long as you are aware of the kdump bug, I'm good.
>
> FWIW:
> Reviewed-by: Leonardo Bras 

A shame. I guess a reader/writer lock would not be much help because
the crash is probably more likely to hit longer running rtas calls?

Alternative is just cheat and do this...?

Thanks,
Nick

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..89728714a06e 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
 {
unsigned long flags;
 
+   if (atomic_read(_cpu) == raw_smp_processor_id()) {
+   /*
+* Crash in progress on this CPU. Other CPUs should be
+* stopped by now, so skip the lock in case it was being
+* held, and is now needed for crashing e.g., kexec
+* (machine_kexec_mask_interrupts) requires rtas calls.
+*
+* It's possible this could have caused rtas state
breakage
+* but the alternative is deadlock.
+*/
+   return 0;
+   }
+
local_irq_save(flags);
preempt_disable();
arch_spin_lock();
@@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
 
 static void unlock_rtas(unsigned long flags)
 {
+   if (atomic_read(_cpu) == raw_smp_processor_id())
+   return;
+
arch_spin_unlock();
local_irq_restore(flags);
preempt_enable();



Re: [PATCH] powerpc/64: Remove unused SYS_CALL_TABLE symbol

2022-09-15 Thread Nicholas Piggin
On Tue Sep 13, 2022 at 10:45 PM AEST, Michael Ellerman wrote:
> In interrupt_64.S, formerly entry_64.S, there are two toc entries
> created for sys_call_table and compat_sys_call_table.
>
> These are no longer used, since the system call entry was converted from
> asm to C, so remove them.
>

Acked-by: Nicholas Piggin 

> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic 
> in C")
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/interrupt_64.S | 10 --
>  1 file changed, 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/interrupt_64.S 
> b/arch/powerpc/kernel/interrupt_64.S
> index f9ee93e3a0d3..0093a6b6b1e1 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -13,16 +13,6 @@
>  #include 
>  #include 
>  
> - .section".toc","aw"
> -SYS_CALL_TABLE:
> - .tc sys_call_table[TC],sys_call_table
> -
> -#ifdef CONFIG_COMPAT
> -COMPAT_SYS_CALL_TABLE:
> - .tc compat_sys_call_table[TC],compat_sys_call_table
> -#endif
> - .previous
> -
>   .align 7
>  
>  .macro DEBUG_SRR_VALID srr
> -- 
> 2.37.2



Re: [PATCH kernel] KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent

2022-09-15 Thread Nicholas Piggin
Seems okay to me, it probably better go through the KVM tree though.

Acked-by: Nicholas Piggin 

Thanks,
Nick

On Tue Sep 13, 2022 at 10:50 PM AEST, Alexey Kardashevskiy wrote:
> Ping? It's been a while and probably got lost :-/
>
> On 18/05/2022 16:27, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 5/4/22 17:48, Alexey Kardashevskiy wrote:
> >> When introduced, IRQFD resampling worked on POWER8 with XICS. However
> >> KVM on POWER9 has never implemented it - the compatibility mode code
> >> ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
> >> XIVE mode does not handle INTx in KVM at all.
> >>
> >> This moved the capability support advertising to platforms and stops
> >> advertising it on XIVE, i.e. POWER9 and later.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>
> >>
> >> Or I could move this one together with KVM_CAP_IRQFD. Thoughts?
> > 
> > 
> > Ping?
> > 
> >>
> >> ---
> >>   arch/arm64/kvm/arm.c   | 3 +++
> >>   arch/mips/kvm/mips.c   | 3 +++
> >>   arch/powerpc/kvm/powerpc.c | 6 ++
> >>   arch/riscv/kvm/vm.c    | 3 +++
> >>   arch/s390/kvm/kvm-s390.c   | 3 +++
> >>   arch/x86/kvm/x86.c | 3 +++
> >>   virt/kvm/kvm_main.c    | 1 -
> >>   7 files changed, 21 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 523bc934fe2f..092f0614bae3 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -210,6 +210,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>   case KVM_CAP_SET_GUEST_DEBUG:
> >>   case KVM_CAP_VCPU_ATTRIBUTES:
> >>   case KVM_CAP_PTP_KVM:
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +#endif
> >>   r = 1;
> >>   break;
> >>   case KVM_CAP_SET_GUEST_DEBUG2:
> >> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> >> index a25e0b73ee70..0f3de470a73e 100644
> >> --- a/arch/mips/kvm/mips.c
> >> +++ b/arch/mips/kvm/mips.c
> >> @@ -1071,6 +1071,9 @@ int kvm_vm_ioctl_check_extension(struct kvm 
> >> *kvm, long ext)
> >>   case KVM_CAP_READONLY_MEM:
> >>   case KVM_CAP_SYNC_MMU:
> >>   case KVM_CAP_IMMEDIATE_EXIT:
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +#endif
> >>   r = 1;
> >>   break;
> >>   case KVM_CAP_NR_VCPUS:
> >> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> >> index 875c30c12db0..87698ffef3be 100644
> >> --- a/arch/powerpc/kvm/powerpc.c
> >> +++ b/arch/powerpc/kvm/powerpc.c
> >> @@ -591,6 +591,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>   break;
> >>   #endif
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +    r = !xive_enabled();
> >> +    break;
> >> +#endif
> >> +
> >>   case KVM_CAP_PPC_ALLOC_HTAB:
> >>   r = hv_enabled;
> >>   break;
> >> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> >> index c768f75279ef..b58579b386bb 100644
> >> --- a/arch/riscv/kvm/vm.c
> >> +++ b/arch/riscv/kvm/vm.c
> >> @@ -63,6 +63,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>   case KVM_CAP_READONLY_MEM:
> >>   case KVM_CAP_MP_STATE:
> >>   case KVM_CAP_IMMEDIATE_EXIT:
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +#endif
> >>   r = 1;
> >>   break;
> >>   case KVM_CAP_NR_VCPUS:
> >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> >> index 156d1c25a3c1..85e093fc8d13 100644
> >> --- a/arch/s390/kvm/kvm-s390.c
> >> +++ b/arch/s390/kvm/kvm-s390.c
> >> @@ -564,6 +564,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> >> long ext)
> >>   case KVM_CAP_SET_GUEST_DEBUG:
> >>   case KVM_CAP_S390_DIAG318:
> >>   case KVM_CAP_S390_MEM_OP_EXTENSION:
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +#endif
> >>   r = 1;
> >>   break;
> >>   case KVM_CAP_SET_GUEST_DEBUG2:
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 0c0ca599a353..a0a7b769483d 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -4273,6 +4273,9 @@ int kvm_vm_ioctl_check_extension(struct kvm 
> >> *kvm, long ext)
> >>   case KVM_CAP_SYS_ATTRIBUTES:
> >>   case KVM_CAP_VAPIC:
> >>   case KVM_CAP_ENABLE_CAP:
> >> +#ifdef CONFIG_HAVE_KVM_IRQFD
> >> +    case KVM_CAP_IRQFD_RESAMPLE:
> >> +#endif
> >>   r = 1;
> >>   break;
> >>   case KVM_CAP_EXIT_HYPERCALL:
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index 70e05af5ebea..885e72e668a5 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -4293,7 +4293,6 @@ static long 
> >> kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >>   #endif
> >>   #ifdef CONFIG_HAVE_KVM_IRQFD
> >>   case KVM_CAP_IRQFD:
> >> -    case KVM_CAP_IRQFD_RESAMPLE:
> >>   #endif
> >> 

Re: [PATCH v4 10/20] powerpc: Use common syscall handler type

2022-09-15 Thread Nicholas Piggin
On Thu Sep 15, 2022 at 3:45 PM AEST, Rohan McLure wrote:
>
>
> > On 12 Sep 2022, at 8:56 pm, Nicholas Piggin  wrote:
> > 
> > On Wed Aug 24, 2022 at 12:05 PM AEST, Rohan McLure wrote:
> >> Cause syscall handlers to be typed as follows when called indirectly
> >> throughout the kernel.
> >> 
> >> typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
> >>   unsigned long, unsigned long, unsigned long);
> > 
> > The point is... better type checking?
> > 
> >> 
> >> Since both 32 and 64-bit abis allow for at least the first six
> >> machine-word length parameters to a function to be passed by registers,
> >> even handlers which admit fewer than six parameters may be viewed as
> >> having the above type.
> >> 
> >> Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce
> >> explicit cast on systems with SPUs.
> >> 
> >> Signed-off-by: Rohan McLure 
> >> ---
> >> V1 -> V2: New patch.
> >> V2 -> V3: Remove unnecessary cast from const syscall_fn to syscall_fn
> >> ---
> >> arch/powerpc/include/asm/syscall.h  | 7 +--
> >> arch/powerpc/include/asm/syscalls.h | 1 +
> >> arch/powerpc/kernel/systbl.c| 6 +++---
> >> arch/powerpc/kernel/vdso.c  | 4 ++--
> >> arch/powerpc/platforms/cell/spu_callbacks.c | 6 +++---
> >> 5 files changed, 14 insertions(+), 10 deletions(-)
> >> 
> >> diff --git a/arch/powerpc/include/asm/syscall.h 
> >> b/arch/powerpc/include/asm/syscall.h
> >> index 25fc8ad9a27a..d2a8dfd5de33 100644
> >> --- a/arch/powerpc/include/asm/syscall.h
> >> +++ b/arch/powerpc/include/asm/syscall.h
> >> @@ -14,9 +14,12 @@
> >> #include 
> >> #include 
> >> 
> >> +typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
> >> + unsigned long, unsigned long, unsigned long);
> >> +
> >> /* ftrace syscalls requires exporting the sys_call_table */
> >> -extern const unsigned long sys_call_table[];
> >> -extern const unsigned long compat_sys_call_table[];
> >> +extern const syscall_fn sys_call_table[];
> >> +extern const syscall_fn compat_sys_call_table[];
> > 
> > Ah you constify it in this patch. I think the previous patch should have
> > kept the const, and it should keep the unsigned long type rather than
> > use void *. Either that or do this patch first.
> > 
> >> static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
> >> *regs)
> >> {
> >> diff --git a/arch/powerpc/include/asm/syscalls.h 
> >> b/arch/powerpc/include/asm/syscalls.h
> >> index 91417dee534e..e979b7593d2b 100644
> >> --- a/arch/powerpc/include/asm/syscalls.h
> >> +++ b/arch/powerpc/include/asm/syscalls.h
> >> @@ -8,6 +8,7 @@
> >> #include 
> >> #include 
> >> 
> >> +#include 
> >> #ifdef CONFIG_PPC64
> >> #include 
> >> #endif
> > 
> > Is this necessary or should be in another patch?
>
> Good spot. This belongs in the patch that produces systbl.c.
>
> > 
> >> diff --git a/arch/powerpc/kernel/systbl.c b/arch/powerpc/kernel/systbl.c
> >> index 99ffdfef6b9c..b88a9c2a1f50 100644
> >> --- a/arch/powerpc/kernel/systbl.c
> >> +++ b/arch/powerpc/kernel/systbl.c
> >> @@ -21,10 +21,10 @@
> >> #define __SYSCALL(nr, entry) [nr] = __powerpc_##entry,
> >> #define __powerpc_sys_ni_syscall   sys_ni_syscall
> >> #else
> >> -#define __SYSCALL(nr, entry) [nr] = entry,
> >> +#define __SYSCALL(nr, entry) [nr] = (void *) entry,
> >> #endif
> > 
> > Also perhaps this should have been in the prior pach and this pach
> > should change the cast from void to syscall_fn ?
>
> This cast to (void *) kicks in when casting functions with six or fewer

Right, I was just wondering if it needs to be in the previous patch
because that's where you changed the type from unsigned long to void *.
Maybe there's some reason it's not required, I didn't entirely follow
all the macro expansion.

> parameters to six-parameter type accepting and returning u64. Sadly I can’t
> find a way to avoid -Wcast-function-type even with (__force syscall_fn) short
> of an ugly casti to void* here. Any suggestions?

Ah okay. I think __force is a sparse specific attribute. Not sure if
gcc/clang can do it. There is a diag thing which maybe can turn off
warnings selectively, but if (void *) is turning off the warning
selectively then there would be no benefit to using it :) That's fine to
keep using void *.

Thanks,
Nick


Re: [PATCH v4 19/20] powerpc/64s: Clear gprs on interrupt routine entry in Book3S

2022-09-15 Thread Nicholas Piggin
On Thu Sep 15, 2022 at 4:55 PM AEST, Rohan McLure wrote:
>
>
> > On 12 Sep 2022, at 10:15 pm, Nicholas Piggin  wrote:
> > 
> > On Wed Aug 24, 2022 at 12:05 PM AEST, Rohan McLure wrote:
> >> Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
> >> other interrupt sources to limit influence of user-space values
> >> in potential speculation gadgets. The remaining gprs are overwritten by
> >> entry macros to interrupt handlers, irrespective of whether or not a
> >> given handler consumes these register values.
> >> 
> >> Prior to this commit, r14-r31 are restored on a per-interrupt basis at
> >> exit, but now they are always restored. Remove explicit REST_NVGPRS
> >> invocations as non-volatiles must now always be restored. 32-bit systems
> >> do not clear user registers on interrupt, and continue to depend on the
> >> return value of interrupt_exit_user_prepare to determine whether or not
> >> to restore non-volatiles.
> >> 
> >> The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
> >> See ~0.8% performance regression with this mitigation, but this
> >> indicates the worst-case performance due to heavier-weight interrupt
> >> handlers.
> > 
> > Ow, my heart :(
> > 
> > Are we not keeping a CONFIG option to rid ourselves of this vile
> > performance robbing thing? Are we getting rid of the whole
> > _TIF_RESTOREALL thing too, or does PPC32 want to keep it?
>
> I see no reason not to include a CONFIG option for this 
> mitigation here other than simplicity. Any suggestions for a name?
> I’m thinking PPC64_SANITIZE_INTERRUPTS. Defaults on Book3E_64, optional
> on Book3S_64.

INTERRUPT_SANITIZE_REGISTERS perhaps?

>
> >> 
> >> Signed-off-by: Rohan McLure 
> >> ---
> >> V1 -> V2: Add benchmark data
> >> V2 -> V3: Use ZEROIZE_GPR{,S} macro renames, clarify
> >> interrupt_exit_user_prepare changes in summary.
> >> ---
> >> arch/powerpc/kernel/exceptions-64s.S | 21 -
> >> arch/powerpc/kernel/interrupt_64.S   |  9 ++---
> >> 2 files changed, 10 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> >> b/arch/powerpc/kernel/exceptions-64s.S
> >> index a3b51441b039..038e42fb2182 100644
> >> --- a/arch/powerpc/kernel/exceptions-64s.S
> >> +++ b/arch/powerpc/kernel/exceptions-64s.S
> >> @@ -502,6 +502,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
> >>std r10,0(r1)   /* make stack chain pointer */
> >>std r0,GPR0(r1) /* save r0 in stackframe*/
> >>std r10,GPR1(r1)/* save r1 in stackframe*/
> >> +  ZEROIZE_GPR(0)
> >> 
> >>/* Mark our [H]SRRs valid for return */
> >>li  r10,1
> >> @@ -538,14 +539,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> >>ld  r10,IAREA+EX_R10(r13)
> >>std r9,GPR9(r1)
> >>std r10,GPR10(r1)
> >> +  ZEROIZE_GPRS(9, 10)
> > 
> > You use 9/10 right afterwards, this'd have to move down to where
> > you zero r11 at least.
> > 
> >>ld  r9,IAREA+EX_R11(r13)/* move r11 - r13 to stackframe */
> >>ld  r10,IAREA+EX_R12(r13)
> >>ld  r11,IAREA+EX_R13(r13)
> >>std r9,GPR11(r1)
> >>std r10,GPR12(r1)
> >>std r11,GPR13(r1)
> >> +  /* keep r12 ([H]SRR1/MSR), r13 (PACA) for interrupt routine */
> >> +  ZEROIZE_GPR(11)
> > 
> > Kernel always has to keep r13 so no need to comment that. Keeping r11,
> > is that for those annoying fp_unavailable etc handlers?
> > 
> > There's probably not much a user can do with this, given they're set
> > from the MSR. Use can influence some bits of its MSR though. So long
> > as we're being paranoid, you could add an IOPTION to retain r11 only for
> > the handlers that need it, or have them load it from MSR and zero it
> > here.
>
> Good suggestion. Presume you’re referring to r12 here. I might go the
> IOPTION route.

Yeah r12, I think you need it because some of those assembly handlers
expect it there to check SRR1 bits. Having an IOPTION is probably good
and it documents that quirk of those handlers. That's bitten me before.

Thanks,
Nick


Re: [PATCH 1/7] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE

2022-09-15 Thread Nicholas Piggin
On Thu Sep 15, 2022 at 10:51 PM AEST, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > Le 14/09/2022 à 17:47, Nicholas Piggin a écrit :
> >> Const function pointers live in .data.rel.ro rather than .rodata because
> >> they must be relocated. This change prevents powerpc/32 from generating
> >> R_PPC_UADDR32 relocations (which are not handled). The sys_call_table is
> >> moved to writeable memory, but a later change will move it back.
> >
> > Aren't you missing commit c7acee3d2f12 ("powerpc: align syscall table 
> > for ppc32") ?
>
> That's in fixes. I'll sort it out when I apply this, or when I merge
> fixes into next.

Yeah that explains the relocations I was seeing, I should have dug
further into that, so they're really unrelated to this patch.

> > I can't see any R_PPC_UADDR32 relocations generated by ppc4xx_defconfig 
> > + CONFIG_RELOCATABLE unless I revert that commit.
>
> Presumably this change accidentally aligns the syscall table.
>
> >> After this patch, 44x_defconfig + CONFIG_RELOCATABLE boots to busybox.
>  
> So that's probably just because of the alignment too.
>
> I think this patch should go after .data.rel.ro is made read only.

Yeah that should be fine.

Thanks,
Nick


Re: [PATCH 2/7] powerpc: move __end_rodata to cover arch read-only sections

2022-09-15 Thread Nicholas Piggin
On Thu Sep 15, 2022 at 10:47 PM AEST, Michael Ellerman wrote:
> Nicholas Piggin  writes:
> > powerpc has a number of read-only sections and tables that are put
> > after RO_DATA(). Move the __end_rodata symbol to cover these as well.
> >
> > Setting memory to read-only at boot is done using __init_begin,
> > change that that to use __end_rodata.
>
> Did you just do that because it seems logical?

I actually was looking at moving init so runtime code and data is
closer.

> Because it does seem logical, but it leaves a RWX region in the gap
> between __end_rodata and __init_begin, which is bad.
>
> This is the current behaviour, on radix:
>
> ---[ Start of kernel VM ]---
> 0xc000-0xc1ff  0x32M 
> r  X   pte  valid  presentdirty  accessed
> 0xc200-0xc0007fff  0x0200  2016M 
> r  w   pte  valid  presentdirty  accessed
>
> And with your change:
>
> ---[ Start of kernel VM ]---
> 0xc000-0xc13f  0x20M 
> r  X   pte  valid  presentdirty  accessed
> 0xc140-0xc1ff  0x014012M 
> r  w   X   pte  valid  presentdirty  accessed
> 0xc200-0xc0007fff  0x0200  2016M 
> r  w   pte  valid  presentdirty  accessed
>
>
> On radix the 16M alignment is larger than we need, but we need to chose
> a value at build time that works for radix and hash.
>
> We could make the code smarter on radix, to mark those pages in between
> __end_rodata and __init_begin as RW_ and use them for data. But that
> would be a more involved change.

Ah, yes Christophe pointed out it's broken too. We could just align
__end_rodata to STRICT_ALIGN_SIZE for this patch?

Thanks,
Nick


Re: [RFC 0/3] Asynchronous EEH recovery

2022-09-15 Thread Ganesh

On 9/2/22 05:49, Jason Gunthorpe wrote:


On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:

Hi,

EEH reocvery is currently serialized and these patches shorten
the time taken for EEH recovery by making the recovery to run
in parallel. The original author of these patches is Sam Bobroff,
I have rebased and tested these patches.

How did you test this?


This is tested on SRIOV VFs.



I understand that VFIO on 6.0 does not work at all on power?

I am waiting for power maintainers to pick up this series to fix it:

https://lore.kernel.org/kvm/20220714081822.3717693-1-...@ozlabs.ru/

Jason

[linux-next:master] BUILD REGRESSION 6ce5d01e7011b32600656bf90a626b1e51fb192a

2022-09-15 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 6ce5d01e7011b32600656bf90a626b1e51fb192a  Add linux-next specific 
files for 20220915

Error/Warning reports:

https://lore.kernel.org/linux-mm/202209150141.wgbakqmx-...@intel.com
https://lore.kernel.org/linux-mm/202209150959.hewcnjxh-...@intel.com
https://lore.kernel.org/llvm/202209141913.nxzv3hwm-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

ERROR: modpost: "devm_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_memremap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_memunmap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/char/xillybus/xillybus_of.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/clk/xilinx/clk-xlnx-clock-wizard.ko] undefined!
ERROR: modpost: "ioremap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "ioremap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: modpost: "iounmap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "iounmap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
arch/parisc/lib/iomap.c:363:5: warning: no previous prototype for 
'ioread64_lo_hi' [-Wmissing-prototypes]
arch/parisc/lib/iomap.c:373:5: warning: no previous prototype for 
'ioread64_hi_lo' [-Wmissing-prototypes]
arch/parisc/lib/iomap.c:448:6: warning: no previous prototype for 
'iowrite64_lo_hi' [-Wmissing-prototypes]
arch/parisc/lib/iomap.c:454:6: warning: no previous prototype for 
'iowrite64_hi_lo' [-Wmissing-prototypes]
drivers/gpu/drm/drm_atomic_helper.c:802: warning: expecting prototype for 
drm_atomic_helper_check_wb_connector_state(). Prototype was for 
drm_atomic_helper_check_wb_encoder_state() instead
drivers/hwmon/emc2305.c:194 emc2305_set_cur_state() warn: impossible condition 
'(val > 255) => (0-255 > 255)'
drivers/scsi/qla2xxx/qla_os.c:2854:23: warning: assignment to 'struct 
trace_array *' from 'int' makes pointer from integer without a cast 
[-Wint-conversion]
drivers/scsi/qla2xxx/qla_os.c:2854:25: error: implicit declaration of function 
'trace_array_get_by_name'; did you mean 'trace_array_set_clr_event'? 
[-Werror=implicit-function-declaration]
drivers/scsi/qla2xxx/qla_os.c:2869:9: error: implicit declaration of function 
'trace_array_put' [-Werror=implicit-function-declaration]
make[4]: *** No rule to make target 'drivers/crypto/aspeed/aspeed_crypto.o', 
needed by 'drivers/crypto/aspeed/built-in.a'.

Unverified Error/Warning (likely false positive, please contact us if 
interested):

ERROR: modpost: "__tsan_memcpy" [arch/arm64/crypto/sha512-arm64.ko] undefined!
ERROR: modpost: "__tsan_memcpy" [arch/arm64/crypto/sha512-ce.ko] undefined!
ERROR: modpost: "__tsan_memcpy" [fs/binfmt_misc.ko] undefined!
ERROR: modpost: "__tsan_memcpy" [kernel/kcsan/kcsan_test.ko] undefined!
ERROR: modpost: "__tsan_memcpy" [mm/zsmalloc.ko] undefined!
ERROR: modpost: "__tsan_memset" [arch/arm64/crypto/sha512-arm64.ko] undefined!
ERROR: modpost: "__tsan_memset" [arch/arm64/crypto/sha512-ce.ko] undefined!
ERROR: modpost: "__tsan_memset" [fs/binfmt_misc.ko] undefined!
ERROR: modpost: "__tsan_memset" [kernel/scftorture.ko] undefined!
ERROR: modpost: "__tsan_memset" [mm/zsmalloc.ko] undefined!

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-drm_atomic_helper.c:warning:expecting-prototype-for-drm_atomic_helper_check_wb_connector_state().-Prototype-was-for-drm_atomic_helper_check_wb_encoder_state()-instead
|   |-- 
drivers-scsi-qla2xxx-qla_os.c:error:implicit-declaration-of-function-trace_array_get_by_name
|   |-- 
drivers-scsi-qla2xxx-qla_os.c:error:implicit-declaration-of-function-trace_array_put
|   `-- 
drivers-scsi-qla2xxx-qla_os.c:warning:assignment-to-struct-trace_array-from-int-makes-pointer-from-integer-without-a-cast
|-- alpha-randconfig-r013-20220914
|   `-- 
drivers-gpu-drm-drm_atomic_helper.c:warning:expecting-prototype-for-drm_atomic_helper_check_wb_connector_state().-Prototype-was-for-drm_atomic_helper_check_wb_encoder_state()-instead
|-- arc-allyesconfig
|   `-- 
drivers-gpu-drm-drm_atomic_helper.c:warning:expecting-prototype-for-drm_atomic_helper_check_wb_connector_state().-Prototype-was-for-drm_atomic_helper_check_wb_encoder_state()-instead
|-- arc-randconfig-r043-20220914
|   `-- 
drivers-gpu-drm-drm_atomic_helper.c:warning:expecting-prototype-for-drm_atomic_helper_check_wb_connector_state().-Prototype-was-for-drm_atomic_helper_check_wb_encoder_state()-instead
|-- arm-allyesconfig
|   `-- 
drivers-gpu-drm-drm_atomic_helper.c:warning:expecting-prototype-for-drm_atomic_helper_check_wb_connector_state().-Prototype-was-for-dr

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-15 Thread Nadav Amit



> On Sep 14, 2022, at 11:42 PM, Barry Song <21cn...@gmail.com> wrote:
> 
>> 
>> The very idea behind TLB deferral is the opportunity it (might) provide
>> to accumulate address ranges and cpu masks so that individual TLB flush
>> can be replaced with a more cost effective range based TLB flush. Hence
>> I guess unless address range or cpumask based cost effective TLB flush
>> is available, deferral does not improve the unmap performance as much.
> 
> 
> After sending tlbi, if we wait for the completion of tlbi, we have to get Ack
> from all cpus in the system, tlbi is not scalable. The point here is that we
> avoid waiting for each individual TLBi. Alternatively, they are batched. If
> you read the benchmark in the commit log, you can find the great decline
> in the cost to swap out a page.

Just a minor correction: arch_tlbbatch_flush() does not collect ranges.
On x86 it only accumulate CPU mask.



Re: [PATCH] powerpc/pseries: add lparctl driver for platform-specific functions

2022-09-15 Thread Nathan Lynch
Michal Suchánek  writes:
> On Tue, Sep 13, 2022 at 12:02:42PM -0500, Nathan Lynch wrote:
>> Anyway, of course I intend to support the more complex calls, but
>> supporting the simple calls actually unbreaks a lot of stuff.
>
> The thing is that supporting calls that return more than one page of
> data is absolutely required, and this interface built around fixed size
> data transfer can't do it.

Again, it is appropriate for the system parameter commands and handlers
to deal in small fixed size buffers. Code for VPD retrieval will have to
work differently.

> So it sounds like a ticked for redoing the driver right after it's
> implemented, or ending up with two subtly different interfaces - one for
> the calls that can return multiple pages of data, and one for the simple
> calls.
>
> That does not sound like a good idea at all to me.

That's not my plan, and I won't be trying to get anything merged without
supporting some of the more complex cases. OK?


Re: [PATCH 1/7] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE

2022-09-15 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 14/09/2022 à 17:47, Nicholas Piggin a écrit :
>> Const function pointers live in .data.rel.ro rather than .rodata because
>> they must be relocated. This change prevents powerpc/32 from generating
>> R_PPC_UADDR32 relocations (which are not handled). The sys_call_table is
>> moved to writeable memory, but a later change will move it back.
>
> Aren't you missing commit c7acee3d2f12 ("powerpc: align syscall table 
> for ppc32") ?

That's in fixes. I'll sort it out when I apply this, or when I merge
fixes into next.

> I can't see any R_PPC_UADDR32 relocations generated by ppc4xx_defconfig 
> + CONFIG_RELOCATABLE unless I revert that commit.

Presumably this change accidentally aligns the syscall table.

>> After this patch, 44x_defconfig + CONFIG_RELOCATABLE boots to busybox.
 
So that's probably just because of the alignment too.

I think this patch should go after .data.rel.ro is made read only.

cheers


Re: [PATCH 2/7] powerpc: move __end_rodata to cover arch read-only sections

2022-09-15 Thread Michael Ellerman
Nicholas Piggin  writes:
> powerpc has a number of read-only sections and tables that are put
> after RO_DATA(). Move the __end_rodata symbol to cover these as well.
>
> Setting memory to read-only at boot is done using __init_begin,
> change that that to use __end_rodata.

Did you just do that because it seems logical?

Because it does seem logical, but it leaves a RWX region in the gap
between __end_rodata and __init_begin, which is bad.

This is the current behaviour, on radix:

---[ Start of kernel VM ]---
0xc000-0xc1ff  0x32M r  
X   pte  valid  presentdirty  accessed
0xc200-0xc0007fff  0x0200  2016M r  
w   pte  valid  presentdirty  accessed

And with your change:

---[ Start of kernel VM ]---
0xc000-0xc13f  0x20M r  
X   pte  valid  presentdirty  accessed
0xc140-0xc1ff  0x014012M r  
w   X   pte  valid  presentdirty  accessed
0xc200-0xc0007fff  0x0200  2016M r  
w   pte  valid  presentdirty  accessed


On radix the 16M alignment is larger than we need, but we need to chose
a value at build time that works for radix and hash.

We could make the code smarter on radix, to mark those pages in between
__end_rodata and __init_begin as RW_ and use them for data. But that
would be a more involved change.

I think if we just drop the changes to the C files this patch is OK to
go in.

cheers


[PATCH 6/6] powerpc/64s/interrupt: halt early boot interrupts if paca is not set up

2022-09-15 Thread Nicholas Piggin
Ensure r13 is zero from very early in boot until it gets set to the
boot paca pointer. This allows early program and mce handlers to halt
if there is no valid paca, rather than potentially run off into the
weeds. This preserves register and memory contents for low level
debugging tools.

Nothing could be printed to console at this point in any case because
even udbg is only set up after the boot paca is set, so this shouldn't
be missed.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 11 +--
 arch/powerpc/kernel/head_64.S|  3 +++
 arch/powerpc/kernel/setup_64.c   |  1 +
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 29d701a20c41..5078b2578dbe 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -724,8 +724,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
  * userspace starts.
  */
 .macro EARLY_BOOT_FIXUP
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
 BEGIN_FTR_SECTION
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
tdi   0,0,0x48// Trap never, or in reverse endian: b . + 8
b 2f  // Skip trampoline if endian is correct
.long 0xa643707d  // mtsprg  0, r11  Backup r11
@@ -753,8 +753,15 @@ BEGIN_FTR_SECTION
mtsrr0 r11// Restore SRR0
mfsprg r11, 0 // Restore r11
 2:
-END_FTR_SECTION(0, 1) // nop out after boot
 #endif
+   /*
+* program check could hit at any time, and pseries can not block
+* MSR[ME] in early boot. So check if there is anything useful in r13
+* yet, and spin forever if not.
+*/
+   cmpdi   r13,0
+   beq .
+END_FTR_SECTION(0, 1) // nop out after boot
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index cf2c08902c05..6aeba8a9814e 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -494,6 +494,9 @@ __start_initialization_multiplatform:
/* Make sure we are running in 64 bits mode */
bl  enable_64b_mode
 
+   /* Zero r13 (paca) so early program check / mce don't use it */
+   li  r13,0
+
/* Get TOC pointer (current runtime address) */
bl  relative_toc
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 214d10caf458..d290ea9f0865 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -362,6 +362,7 @@ void __init early_setup(unsigned long dt_ptr)
 */
initialise_paca(_paca, 0);
fixup_boot_paca(_paca);
+   WARN_ON(local_paca != 0);
setup_paca(_paca); /* install the paca into registers */
 
/*  printk is now safe to use --- */
-- 
2.37.2



[PATCH 5/6] powerpc/64: don't set boot CPU's r13 to paca until the structure is set up

2022-09-15 Thread Nicholas Piggin
The idea is to get to the point where if r13 is non-zero, then it should
contain a reasonable paca. This can be used in early boot program check
and machine check handlers to avoid running off into the weeds if they
hit before r13 has a paca.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 08173eea8977..214d10caf458 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -177,23 +177,23 @@ early_param("smt-enabled", early_smt_enabled);
 #endif /* CONFIG_SMP */
 
 /** Fix up paca fields required for the boot cpu */
-static void __init fixup_boot_paca(void)
+static void __init fixup_boot_paca(struct paca_struct *boot_paca)
 {
/* The boot cpu is started */
-   get_paca()->cpu_start = 1;
+   boot_paca->cpu_start = 1;
/*
 * Give the early boot machine check stack somewhere to use, use
 * half of the init stack. This is a bit hacky but there should not be
 * deep stack usage in early init so shouldn't overflow it or overwrite
 * things.
 */
-   get_paca()->mc_emergency_sp = (void *)_thread_union +
+   boot_paca->mc_emergency_sp = (void *)_thread_union +
(THREAD_SIZE/2);
/* Allow percpu accesses to work until we setup percpu data */
-   get_paca()->data_offset = 0;
+   boot_paca->data_offset = 0;
/* Mark interrupts soft and hard disabled in PACA */
-   irq_soft_mask_set(IRQS_DISABLED);
-   get_paca()->irq_happened = PACA_IRQ_HARD_DIS;
+   boot_paca->irq_soft_mask = IRQS_DISABLED;
+   boot_paca->irq_happened = PACA_IRQ_HARD_DIS;
WARN_ON(mfmsr() & MSR_EE);
 }
 
@@ -361,8 +361,8 @@ void __init early_setup(unsigned long dt_ptr)
 * what CPU we are on.
 */
initialise_paca(_paca, 0);
-   setup_paca(_paca);
-   fixup_boot_paca();
+   fixup_boot_paca(_paca);
+   setup_paca(_paca); /* install the paca into registers */
 
/*  printk is now safe to use --- */
 
@@ -391,8 +391,8 @@ void __init early_setup(unsigned long dt_ptr)
/* Poison paca_ptrs[0] again if it's not the boot cpu */
memset(_ptrs[0], 0x88, sizeof(paca_ptrs[0]));
}
-   setup_paca(paca_ptrs[boot_cpuid]);
-   fixup_boot_paca();
+   fixup_boot_paca(paca_ptrs[boot_cpuid]);
+   setup_paca(paca_ptrs[boot_cpuid]); /* install the paca into registers */
 
/*
 * Configure exception handlers. This include setting up trampolines
-- 
2.37.2



[PATCH 4/6] powerpc/64: avoid using r13 in relocate

2022-09-15 Thread Nicholas Piggin
relocate() uses r13 in early boot before it is used for the paca. Use
a different register for this so r13 is kept unchanged until it is
set to the paca pointer.

Avoid r14 as well while we're here, there's no reason not to use the
volatile registers which is a bit less surprising, and r14 could be used
as another fixed reg one day.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/reloc_64.S | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/reloc_64.S b/arch/powerpc/kernel/reloc_64.S
index 232e4549defe..efd52f2e7033 100644
--- a/arch/powerpc/kernel/reloc_64.S
+++ b/arch/powerpc/kernel/reloc_64.S
@@ -27,8 +27,8 @@ _GLOBAL(relocate)
add r9,r9,r12   /* r9 has runtime addr of .rela.dyn section */
ld  r10,(p_st - 0b)(r12)
add r10,r10,r12 /* r10 has runtime addr of _stext */
-   ld  r13,(p_sym - 0b)(r12)
-   add r13,r13,r12 /* r13 has runtime addr of .dynsym */
+   ld  r4,(p_sym - 0b)(r12)
+   add r4,r4,r12   /* r4 has runtime addr of .dynsym */
 
/*
 * Scan the dynamic section for the RELA, RELASZ and RELAENT entries.
@@ -84,16 +84,16 @@ _GLOBAL(relocate)
ld  r0,16(r9)   /* reloc->r_addend */
b   .Lstore
 .Luaddr64:
-   srdir14,r0,32   /* ELF64_R_SYM(reloc->r_info) */
+   srdir5,r0,32/* ELF64_R_SYM(reloc->r_info) */
clrldi  r0,r0,32
cmpdi   r0,R_PPC64_UADDR64
bne .Lnext
ld  r6,0(r9)
ld  r0,16(r9)
-   mulli   r14,r14,24  /* 24 == sizeof(elf64_sym) */
-   add r14,r14,r13 /* elf64_sym[ELF64_R_SYM] */
-   ld  r14,8(r14)
-   add r0,r0,r14
+   mulli   r5,r5,24/* 24 == sizeof(elf64_sym) */
+   add r5,r5,r4/* elf64_sym[ELF64_R_SYM] */
+   ld  r5,8(r5)
+   add r0,r0,r5
 .Lstore:
add r0,r0,r3
stdxr0,r7,r6
-- 
2.37.2



[PATCH 3/6] powerpc/64s: early boot machine check handler

2022-09-15 Thread Nicholas Piggin
Use the early boot interrupt fixup in the machine check handler to allow
the machine check handler to run before interrupt endian is set up.
Branch to an early boot handler that just does a basic crash, which
allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot
CPU earlier, and the machine check stack is temporarily set to the
middle of the init task stack.

This allows machine checks (e.g., due to invalid data access in real
mode) to print something useful earlier in boot (as soon as udbg is set
up, if CONFIG_PPC_EARLY_DEBUG=y).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |  1 +
 arch/powerpc/kernel/exceptions-64s.S  |  6 +-
 arch/powerpc/kernel/setup_64.c| 12 
 arch/powerpc/kernel/traps.c   | 14 ++
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 81631e64dbeb..a1039b9da42e 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -36,6 +36,7 @@ int64_t __opal_call(int64_t a0, int64_t a1, int64_t a2, 
int64_t a3,
int64_t opcode, uint64_t msr);
 
 /* misc runtime */
+void enable_machine_check(void);
 extern u64 __bswapdi2(u64);
 extern s64 __lshrdi3(s64, int);
 extern s64 __ashldi3(s64, int);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d629bcd7213b..29d701a20c41 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1134,6 +1134,7 @@ INT_DEFINE_BEGIN(machine_check)
 INT_DEFINE_END(machine_check)
 
 EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
+   EARLY_BOOT_FIXUP
GEN_INT_ENTRY machine_check_early, virt=0
 EXC_REAL_END(machine_check, 0x200, 0x100)
 EXC_VIRT_NONE(0x4200, 0x100)
@@ -1198,6 +1199,9 @@ BEGIN_FTR_SECTION
bl  enable_machine_check
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
addir3,r1,STACK_FRAME_OVERHEAD
+BEGIN_FTR_SECTION
+   bl  machine_check_early_boot
+END_FTR_SECTION(0, 1) // nop out after boot
bl  machine_check_early
std r3,RESULT(r1)   /* Save result */
ld  r12,_MSR(r1)
@@ -3098,7 +3102,7 @@ CLOSE_FIXED_SECTION(virt_trampolines);
 USE_TEXT_SECTION()
 
 /* MSR[RI] should be clear because this uses SRR[01] */
-enable_machine_check:
+_GLOBAL(enable_machine_check)
mflrr0
bcl 20,31,$+4
 0: mflrr3
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index ce8fc6575eaa..08173eea8977 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -180,6 +181,14 @@ static void __init fixup_boot_paca(void)
 {
/* The boot cpu is started */
get_paca()->cpu_start = 1;
+   /*
+* Give the early boot machine check stack somewhere to use, use
+* half of the init stack. This is a bit hacky but there should not be
+* deep stack usage in early init so shouldn't overflow it or overwrite
+* things.
+*/
+   get_paca()->mc_emergency_sp = (void *)_thread_union +
+   (THREAD_SIZE/2);
/* Allow percpu accesses to work until we setup percpu data */
get_paca()->data_offset = 0;
/* Mark interrupts soft and hard disabled in PACA */
@@ -357,6 +366,9 @@ void __init early_setup(unsigned long dt_ptr)
 
/*  printk is now safe to use --- */
 
+   if (mfmsr() & MSR_HV)
+   enable_machine_check();
+
/* Try new device tree based feature discovery ... */
if (!dt_cpu_ftrs_init(__va(dt_ptr)))
/* Otherwise use the old style CPU table */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index dadfcef5d6db..37f8375452ad 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -850,6 +851,19 @@ static void __machine_check_exception(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
+DEFINE_INTERRUPT_HANDLER_RAW(machine_check_early_boot)
+{
+   udbg_printf("Machine check (early boot)\n");
+   udbg_printf("SRR0=0x%016lx   SRR1=0x%016lx\n", regs->nip, regs->msr);
+   udbg_printf(" DAR=0x%016lx  DSISR=0x%08lx\n", regs->dar, regs->dsisr);
+   udbg_printf("  LR=0x%016lx R1=0x%08lx\n", regs->link, regs->gpr[1]);
+   udbg_printf("--\n");
+   die("Machine check (early boot)", regs, SIGBUS);
+   for (;;)
+   ;
+   return 0;
+}
+
 DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async)
 {
__machine_check_exception(regs);
-- 
2.37.2



[PATCH 2/6] powerpc/64s/interrupt: move early boot ILE fixup into a macro

2022-09-15 Thread Nicholas Piggin
In preparation for using this sequence in machine check interrupt, move
it into a macro, with a small change to make it position independent.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 101 +++
 1 file changed, 56 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d98732a33afe..d629bcd7213b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -702,6 +702,61 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld  r1,GPR1(r1)
 .endm
 
+/*
+ * EARLY_BOOT_FIXUP - Fix real-mode interrupt with wrong endian in early boot.
+ *
+ * There's a short window during boot where although the kernel is running
+ * little endian, any exceptions will cause the CPU to switch back to big
+ * endian. For example a WARN() boils down to a trap instruction, which will
+ * cause a program check, and we end up here but with the CPU in big endian
+ * mode. The first instruction of the program check handler (in GEN_INT_ENTRY
+ * below) is an mtsprg, which when executed in the wrong endian is an lhzu with
+ * a ~3GB displacement from r3. The content of r3 is random, so that is a load
+ * from some random location, and depending on the system can easily lead to a
+ * checkstop, or an infinitely recursive page fault.
+ *
+ * So to handle that case we have a trampoline here that can detect we are in
+ * the wrong endian and flip us back to the correct endian. We can't flip
+ * MSR[LE] using mtmsr, so we have to use rfid. That requires backing up SRR0/1
+ * as well as a GPR. To do that we use SPRG0/2/3, as SPRG1 is already used for
+ * the paca. SPRG3 is user readable, but this trampoline is only active very
+ * early in boot, and SPRG3 will be reinitialised in vdso_getcpu_init() before
+ * userspace starts.
+ */
+.macro EARLY_BOOT_FIXUP
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+BEGIN_FTR_SECTION
+   tdi   0,0,0x48// Trap never, or in reverse endian: b . + 8
+   b 2f  // Skip trampoline if endian is correct
+   .long 0xa643707d  // mtsprg  0, r11  Backup r11
+   .long 0xa6027a7d  // mfsrr0  r11
+   .long 0xa643727d  // mtsprg  2, r11  Backup SRR0 in SPRG2
+   .long 0xa6027b7d  // mfsrr1  r11
+   .long 0xa643737d  // mtsprg  3, r11  Backup SRR1 in SPRG3
+   .long 0xa600607d  // mfmsr   r11
+   .long 0x01006b69  // xorir11, r11, 1 Invert MSR[LE]
+   .long 0xa6037b7d  // mtsrr1  r11
+   .long 0x34026039  // li  r11, 0x234
+   /*
+* This is 'li  r11,1f' where 1f is the absolute address of that
+* label, byteswapped into the SI field of the instruction.
+*/
+   .long 0x6039 | \
+   ((ABS_ADDR(1f, real_vectors) & 0x00ff) << 24) | \
+   ((ABS_ADDR(1f, real_vectors) & 0xff00) << 8)
+   .long 0xa6037a7d  // mtsrr0  r11
+   .long 0x244c  // rfid
+1:
+   mfsprg r11, 3
+   mtsrr1 r11// Restore SRR1
+   mfsprg r11, 2
+   mtsrr0 r11// Restore SRR0
+   mfsprg r11, 0 // Restore r11
+2:
+END_FTR_SECTION(0, 1) // nop out after boot
+#endif
+.endm
+
 /*
  * There are a few constraints to be concerned with.
  * - Real mode exceptions code/data must be located at their physical location.
@@ -1619,51 +1674,7 @@ INT_DEFINE_BEGIN(program_check)
 INT_DEFINE_END(program_check)
 
 EXC_REAL_BEGIN(program_check, 0x700, 0x100)
-
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
-   /*
-* There's a short window during boot where although the kernel is
-* running little endian, any exceptions will cause the CPU to switch
-* back to big endian. For example a WARN() boils down to a trap
-* instruction, which will cause a program check, and we end up here but
-* with the CPU in big endian mode. The first instruction of the program
-* check handler (in GEN_INT_ENTRY below) is an mtsprg, which when
-* executed in the wrong endian is an lhzu with a ~3GB displacement from
-* r3. The content of r3 is random, so that is a load from some random
-* location, and depending on the system can easily lead to a checkstop,
-* or an infinitely recursive page fault.
-*
-* So to handle that case we have a trampoline here that can detect we
-* are in the wrong endian and flip us back to the correct endian. We
-* can't flip MSR[LE] using mtmsr, so we have to use rfid. That requires
-* backing up SRR0/1 as well as a GPR. To do that we use SPRG0/2/3, as
-* SPRG1 is already used for the paca. SPRG3 is user readable, but this
-* trampoline is only active very early in boot, and SPRG3 will be
-* reinitialised in vdso_getcpu_init() before userspace starts.
-*/
-BEGIN_FTR_SECTION
-   tdi   0,0,0x48// Trap never, or in reverse endian: b . + 8
-   b 1f  // Skip trampoline if endian 

[PATCH 1/6] powerpc/64: mark irqs hard disabled in boot paca

2022-09-15 Thread Nicholas Piggin
This prevents interrupts in early boot (e.g., program check) from
enabling MSR[EE], potentially causing endian mismatch or other
crashes when reporting early boot traps.

Fixes: 4423eb5ae32ec ("powerpc/64/interrupt: make normal synchronous interrupts 
enable MSR[EE] if possible")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2b2d0b0fbb30..ce8fc6575eaa 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -182,8 +182,10 @@ static void __init fixup_boot_paca(void)
get_paca()->cpu_start = 1;
/* Allow percpu accesses to work until we setup percpu data */
get_paca()->data_offset = 0;
-   /* Mark interrupts disabled in PACA */
+   /* Mark interrupts soft and hard disabled in PACA */
irq_soft_mask_set(IRQS_DISABLED);
+   get_paca()->irq_happened = PACA_IRQ_HARD_DIS;
+   WARN_ON(mfmsr() & MSR_EE);
 }
 
 static void __init configure_exceptions(void)
-- 
2.37.2



[PATCH 0/6] powerpc/64: improve boot debugging

2022-09-15 Thread Nicholas Piggin
This series is mostly 64s but has a few fixes that cover 64e too.
The main thing is the early boot machine check handler.

Thanks,
Nick

Nicholas Piggin (6):
  powerpc/64: mark irqs hard disabled in boot paca
  powerpc/64s/interrupt: move early boot ILE fixup into a macro
  powerpc/64s: early boot machine check handler
  powerpc/64: avoid using r13 in relocate
  powerpc/64: don't set boot CPU's r13 to paca until the structure is
set up
  powerpc/64s/interrupt: halt early boot interrupts if paca is not set
up

 arch/powerpc/include/asm/asm-prototypes.h |   1 +
 arch/powerpc/kernel/exceptions-64s.S  | 114 +-
 arch/powerpc/kernel/head_64.S |   3 +
 arch/powerpc/kernel/reloc_64.S|  14 +--
 arch/powerpc/kernel/setup_64.c|  33 +--
 arch/powerpc/kernel/traps.c   |  14 +++
 6 files changed, 117 insertions(+), 62 deletions(-)

-- 
2.37.2



Re: [RFC PATCH v1] spi: fsl_spi: Convert to transfer_one

2022-09-15 Thread Mark Brown
On Thu, 18 Aug 2022 15:38:37 +0200, Christophe Leroy wrote:
> Let the core handle all the chipselect bakery and replace
> transfer_one_message() by transfer_one() and prepare_message().
> 
> At the time being, there is fsl_spi_cs_control() to handle
> chipselects. That function handles both GPIO and non-GPIO
> chipselects. The GPIO chipselects will now be handled by
> the core directly, so only handle non-GPIO chipselects and
> hook it to ->set_cs
> 
> [...]

Applied to

   broonie/spi.git for-next

Thanks!

[1/1] spi: fsl_spi: Convert to transfer_one
  commit: 64ca1a034f00bf6366701df0af9194a6425d5406

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-15 Thread Peter Zijlstra
On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:

> We have found some anonymous information on x86 in .rodata.

Well yes, but that's still a bunch of heuristics on our side.

> I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
> did not found that in the same section so it is a problem on arm64 now.

Nick found Bolt managed the ARM64 jumptables:

  
https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484

But that does look like a less than ideal solution too.

> Does the compiler will emit these for all arches? At lease I tried and
> didn't find anything meaningful (maybe I omitted it).

That's the question; can we get the compiler to help us here in a well
defined manner.


Re: [PATCH v3 11/16] objtool: Add --mnop as an option to --mcount

2022-09-15 Thread Naveen N. Rao

kernel test robot wrote:

Hi Sathvika,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/topic/ppc-kvm]
[also build test ERROR on linus/master v6.0-rc5]
[cannot apply to powerpc/next masahiroy-kbuild/for-next next-20220912]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Sathvika-Vasireddy/objtool-Enable-and-implement-mcount-option-on-powerpc/20220912-163023
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
config: x86_64-rhel-8.3 
(https://download.01.org/0day-ci/archive/20220913/202209130240.gpgmxw7t-...@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-5) 11.3.0
reproduce (this is a W=1 build):
# 
https://github.com/intel-lab-lkp/linux/commit/ca5e2b42c0d4438ba93623579b6860b98f3598f3
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Sathvika-Vasireddy/objtool-Enable-and-implement-mcount-option-on-powerpc/20220912-163023
git checkout ca5e2b42c0d4438ba93623579b6860b98f3598f3
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot 


Thanks.



All errors (new ones prefixed by >>):


cc1: error: '-mnop-mcount' is not implemented for '-fPIC'


CONFIG_NOP_MCOUNT is used for FTRACE_MCOUNT_USE_CC, so instead of:

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8..a8dd138df637 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -189,6 +189,7 @@ config X86
   select HAVE_CONTEXT_TRACKING_USER_OFFSTACK  if 
HAVE_CONTEXT_TRACKING_USER
   select HAVE_C_RECORDMCOUNT
   select HAVE_OBJTOOL_MCOUNT  if HAVE_OBJTOOL
+   select HAVE_NOP_MCOUNT  if HAVE_OBJTOOL_MCOUNT
   select HAVE_BUILDTIME_MCOUNT_SORT
   select HAVE_DEBUG_KMEMLEAK
   select HAVE_DMA_CONTIGUOUS

I think you should do:

+   select HAVE_NOP_MCOUNT  if FTRACE_MCOUNT_USE_OBJTOOL


I was hoping we could reuse CONFIG_NOP_MCOUNT seeing as it is only used 
by s390, but I now wonder if it is better to just keep that separate. We 
could introduce HAVE_OBJTOOL_NOP_MCOUNT for objtool instead.



- Naveen


Re: [PATCH v4 19/20] powerpc/64s: Clear gprs on interrupt routine entry in Book3S

2022-09-15 Thread Rohan McLure



> On 12 Sep 2022, at 10:15 pm, Nicholas Piggin  wrote:
> 
> On Wed Aug 24, 2022 at 12:05 PM AEST, Rohan McLure wrote:
>> Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
>> other interrupt sources to limit influence of user-space values
>> in potential speculation gadgets. The remaining gprs are overwritten by
>> entry macros to interrupt handlers, irrespective of whether or not a
>> given handler consumes these register values.
>> 
>> Prior to this commit, r14-r31 are restored on a per-interrupt basis at
>> exit, but now they are always restored. Remove explicit REST_NVGPRS
>> invocations as non-volatiles must now always be restored. 32-bit systems
>> do not clear user registers on interrupt, and continue to depend on the
>> return value of interrupt_exit_user_prepare to determine whether or not
>> to restore non-volatiles.
>> 
>> The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
>> See ~0.8% performance regression with this mitigation, but this
>> indicates the worst-case performance due to heavier-weight interrupt
>> handlers.
> 
> Ow, my heart :(
> 
> Are we not keeping a CONFIG option to rid ourselves of this vile
> performance robbing thing? Are we getting rid of the whole
> _TIF_RESTOREALL thing too, or does PPC32 want to keep it?

I see no reason not to include a CONFIG option for this 
mitigation here other than simplicity. Any suggestions for a name?
I’m thinking PPC64_SANITIZE_INTERRUPTS. Defaults on Book3E_64, optional
on Book3S_64.

>> 
>> Signed-off-by: Rohan McLure 
>> ---
>> V1 -> V2: Add benchmark data
>> V2 -> V3: Use ZEROIZE_GPR{,S} macro renames, clarify
>> interrupt_exit_user_prepare changes in summary.
>> ---
>> arch/powerpc/kernel/exceptions-64s.S | 21 -
>> arch/powerpc/kernel/interrupt_64.S   |  9 ++---
>> 2 files changed, 10 insertions(+), 20 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
>> b/arch/powerpc/kernel/exceptions-64s.S
>> index a3b51441b039..038e42fb2182 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -502,6 +502,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
>>  std r10,0(r1)   /* make stack chain pointer */
>>  std r0,GPR0(r1) /* save r0 in stackframe*/
>>  std r10,GPR1(r1)/* save r1 in stackframe*/
>> +ZEROIZE_GPR(0)
>> 
>>  /* Mark our [H]SRRs valid for return */
>>  li  r10,1
>> @@ -538,14 +539,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>>  ld  r10,IAREA+EX_R10(r13)
>>  std r9,GPR9(r1)
>>  std r10,GPR10(r1)
>> +ZEROIZE_GPRS(9, 10)
> 
> You use 9/10 right afterwards, this'd have to move down to where
> you zero r11 at least.
> 
>>  ld  r9,IAREA+EX_R11(r13)/* move r11 - r13 to stackframe */
>>  ld  r10,IAREA+EX_R12(r13)
>>  ld  r11,IAREA+EX_R13(r13)
>>  std r9,GPR11(r1)
>>  std r10,GPR12(r1)
>>  std r11,GPR13(r1)
>> +/* keep r12 ([H]SRR1/MSR), r13 (PACA) for interrupt routine */
>> +ZEROIZE_GPR(11)
> 
> Kernel always has to keep r13 so no need to comment that. Keeping r11,
> is that for those annoying fp_unavailable etc handlers?
> 
> There's probably not much a user can do with this, given they're set
> from the MSR. Use can influence some bits of its MSR though. So long
> as we're being paranoid, you could add an IOPTION to retain r11 only for
> the handlers that need it, or have them load it from MSR and zero it
> here.

Good suggestion. Presume you’re referring to r12 here. I might go the
IOPTION route.

> 
> Thanks,
> Nick
> 
>> 
>>  SAVE_NVGPRS(r1)
>> +ZEROIZE_NVGPRS()
>> 
>>  .if IDAR
>>  .if IISIDE
>> @@ -577,8 +582,8 @@ BEGIN_FTR_SECTION
>> END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>>  ld  r10,IAREA+EX_CTR(r13)
>>  std r10,_CTR(r1)
>> -std r2,GPR2(r1) /* save r2 in stackframe*/
>> -SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
>> +SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe   */
>> +ZEROIZE_GPRS(2, 8)
>>  mflrr9  /* Get LR, later save to stack  */
>>  ld  r2,PACATOC(r13) /* get kernel TOC into r2   */
>>  std r9,_LINK(r1)
>> @@ -696,6 +701,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>>  mtlrr9
>>  ld  r9,_CCR(r1)
>>  mtcrr9
>> +REST_NVGPRS(r1)
>>  REST_GPRS(2, 13, r1)
>>  REST_GPR(0, r1)
>>  /* restore original r1. */
>> @@ -1368,11 +1374,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>>  b   interrupt_return_srr
>> 
>> 1:   bl  do_break
>> -/*
>> - * do_break() may have changed the NV GPRS while handling a breakpoint.
>> - * If so, we need to restore them with their updated values.
>> - */
>> -REST_NVGPRS(r1)
>>  b   interrupt_return_srr
>> 
>> 
>> @@ -1598,7 +1599,6 @@ 

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-15 Thread Barry Song
On Thu, Sep 15, 2022 at 6:07 PM Anshuman Khandual
 wrote:
>
>
>
> On 9/9/22 11:05, Barry Song wrote:
> > On Fri, Sep 9, 2022 at 5:24 PM Anshuman Khandual
> >  wrote:
> >>
> >>
> >>
> >> On 8/22/22 13:51, Yicong Yang wrote:
> >>> From: Barry Song 
> >>>
> >>> on x86, batched and deferred tlb shootdown has lead to 90%
> >>> performance increase on tlb shootdown. on arm64, HW can do
> >>> tlb shootdown without software IPI. But sync tlbi is still
> >>> quite expensive.
> >>>
> >>> Even running a simplest program which requires swapout can
> >>> prove this is true,
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>>
> >>>  int main()
> >>>  {
> >>>  #define SIZE (1 * 1024 * 1024)
> >>>  volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | 
> >>> PROT_WRITE,
> >>>   MAP_SHARED | MAP_ANONYMOUS, -1, 
> >>> 0);
> >>>
> >>>  memset(p, 0x88, SIZE);
> >>>
> >>>  for (int k = 0; k < 1; k++) {
> >>>  /* swap in */
> >>>  for (int i = 0; i < SIZE; i += 4096) {
> >>>  (void)p[i];
> >>>  }
> >>>
> >>>  /* swap out */
> >>>  madvise(p, SIZE, MADV_PAGEOUT);
> >>>  }
> >>>  }
> >>>
> >>> Perf result on snapdragon 888 with 8 cores by using zRAM
> >>> as the swap block device.
> >>>
> >>>  ~ # perf record taskset -c 4 ./a.out
> >>>  [ perf record: Woken up 10 times to write data ]
> >>>  [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
> >>>  ~ # perf report
> >>>  # To display the perf.data header info, please use 
> >>> --header/--header-only options.
> >>>  # To display the perf.data header info, please use 
> >>> --header/--header-only options.
> >>>  #
> >>>  #
> >>>  # Total Lost Samples: 0
> >>>  #
> >>>  # Samples: 60K of event 'cycles'
> >>>  # Event count (approx.): 35706225414
> >>>  #
> >>>  # Overhead  Command  Shared Object  Symbol
> >>>  #   ...  .  
> >>> .
> >>>  #
> >>> 21.07%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irq
> >>>  8.23%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
> >>>  6.67%  a.out[kernel.kallsyms]  [k] filemap_map_pages
> >>>  6.16%  a.out[kernel.kallsyms]  [k] __zram_bvec_write
> >>>  5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush
> >>>  3.71%  a.out[kernel.kallsyms]  [k] _raw_spin_lock
> >>>  3.49%  a.out[kernel.kallsyms]  [k] memset64
> >>>  1.63%  a.out[kernel.kallsyms]  [k] clear_page
> >>>  1.42%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock
> >>>  1.26%  a.out[kernel.kallsyms]  [k] 
> >>> mod_zone_state.llvm.8525150236079521930
> >>>  1.23%  a.out[kernel.kallsyms]  [k] xas_load
> >>>  1.15%  a.out[kernel.kallsyms]  [k] zram_slot_lock
> >>>
> >>> ptep_clear_flush() takes 5.36% CPU in the micro-benchmark
> >>> swapping in/out a page mapped by only one process. If the
> >>> page is mapped by multiple processes, typically, like more
> >>> than 100 on a phone, the overhead would be much higher as
> >>> we have to run tlb flush 100 times for one single page.
> >>> Plus, tlb flush overhead will increase with the number
> >>> of CPU cores due to the bad scalability of tlb shootdown
> >>> in HW, so those ARM64 servers should expect much higher
> >>> overhead.
> >>>
> >>> Further perf annonate shows 95% cpu time of ptep_clear_flush
> >>> is actually used by the final dsb() to wait for the completion
> >>> of tlb flush. This provides us a very good chance to leverage
> >>> the existing batched tlb in kernel. The minimum modification
> >>> is that we only send async tlbi in the first stage and we send
> >>> dsb while we have to sync in the second stage.
> >>>
> >>> With the above simplest micro benchmark, collapsed time to
> >>> finish the program decreases around 5%.
> >>>
> >>> Typical collapsed time w/o patch:
> >>>  ~ # time taskset -c 4 ./a.out
> >>>  0.21user 14.34system 0:14.69elapsed
> >>> w/ patch:
> >>>  ~ # time taskset -c 4 ./a.out
> >>>  0.22user 13.45system 0:13.80elapsed
> >>>
> >>> Also, Yicong Yang added the following observation.
> >>>   Tested with benchmark in the commit on Kunpeng920 arm64 server,
> >>>   observed an improvement around 12.5% with command
> >>>   `time ./swap_bench`.
> >>>   w/o w/
> >>>   real0m13.460s   0m11.771s
> >>>   user0m0.248s0m0.279s
> >>>   sys 0m12.039s   0m11.458s
> >>>
> >>>   Originally it's noticed a 16.99% overhead of ptep_clear_flush()
> >>>   which has been eliminated by this patch:
> >>>
> >>>   [root@localhost yang]# perf record -- ./swap_bench && perf report
> >>>   [...]
> >>>   16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush
> >>>
> >>> Cc: Jonathan Corbet 
> >>> Cc: Nadav Amit 
> >>> 

Re: [PATCH -next] net: fs_enet: Fix wrong check in do_pd_setup

2022-09-15 Thread Christophe Leroy


Le 08/09/2022 à 15:55, Zheng Yongjun a écrit :
> [Vous ne recevez pas souvent de courriers de zhengyongj...@huawei.com. 
> Découvrez pourquoi ceci est important à 
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> Should check of_iomap return value 'fep->fec.fecp' instead of 'fep->fcc.fccp'
> 
> Fixes: 976de6a8c304 ("fs_enet: Be an of_platform device when 
> CONFIG_PPC_CPM_NEW_BINDING is set.")
> Signed-off-by: Zheng Yongjun 

Reviewed-by: Christophe Leroy 

> ---
>   drivers/net/ethernet/freescale/fs_enet/mac-fec.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c 
> b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> index 99fe2c210d0f..61f4b6e50d29 100644
> --- a/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> +++ b/drivers/net/ethernet/freescale/fs_enet/mac-fec.c
> @@ -98,7 +98,7 @@ static int do_pd_setup(struct fs_enet_private *fep)
>  return -EINVAL;
> 
>  fep->fec.fecp = of_iomap(ofdev->dev.of_node, 0);
> -   if (!fep->fcc.fccp)
> +   if (!fep->fec.fecp)
>  return -EINVAL;
> 
>  return 0;
> --
> 2.17.1
> 

Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-09-15 Thread Anshuman Khandual



On 9/9/22 11:05, Barry Song wrote:
> On Fri, Sep 9, 2022 at 5:24 PM Anshuman Khandual
>  wrote:
>>
>>
>>
>> On 8/22/22 13:51, Yicong Yang wrote:
>>> From: Barry Song 
>>>
>>> on x86, batched and deferred tlb shootdown has lead to 90%
>>> performance increase on tlb shootdown. on arm64, HW can do
>>> tlb shootdown without software IPI. But sync tlbi is still
>>> quite expensive.
>>>
>>> Even running a simplest program which requires swapout can
>>> prove this is true,
>>>  #include 
>>>  #include 
>>>  #include 
>>>  #include 
>>>
>>>  int main()
>>>  {
>>>  #define SIZE (1 * 1024 * 1024)
>>>  volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | 
>>> PROT_WRITE,
>>>   MAP_SHARED | MAP_ANONYMOUS, -1, 
>>> 0);
>>>
>>>  memset(p, 0x88, SIZE);
>>>
>>>  for (int k = 0; k < 1; k++) {
>>>  /* swap in */
>>>  for (int i = 0; i < SIZE; i += 4096) {
>>>  (void)p[i];
>>>  }
>>>
>>>  /* swap out */
>>>  madvise(p, SIZE, MADV_PAGEOUT);
>>>  }
>>>  }
>>>
>>> Perf result on snapdragon 888 with 8 cores by using zRAM
>>> as the swap block device.
>>>
>>>  ~ # perf record taskset -c 4 ./a.out
>>>  [ perf record: Woken up 10 times to write data ]
>>>  [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ]
>>>  ~ # perf report
>>>  # To display the perf.data header info, please use --header/--header-only 
>>> options.
>>>  # To display the perf.data header info, please use --header/--header-only 
>>> options.
>>>  #
>>>  #
>>>  # Total Lost Samples: 0
>>>  #
>>>  # Samples: 60K of event 'cycles'
>>>  # Event count (approx.): 35706225414
>>>  #
>>>  # Overhead  Command  Shared Object  Symbol
>>>  #   ...  .  
>>> .
>>>  #
>>> 21.07%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irq
>>>  8.23%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>>>  6.67%  a.out[kernel.kallsyms]  [k] filemap_map_pages
>>>  6.16%  a.out[kernel.kallsyms]  [k] __zram_bvec_write
>>>  5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush
>>>  3.71%  a.out[kernel.kallsyms]  [k] _raw_spin_lock
>>>  3.49%  a.out[kernel.kallsyms]  [k] memset64
>>>  1.63%  a.out[kernel.kallsyms]  [k] clear_page
>>>  1.42%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock
>>>  1.26%  a.out[kernel.kallsyms]  [k] 
>>> mod_zone_state.llvm.8525150236079521930
>>>  1.23%  a.out[kernel.kallsyms]  [k] xas_load
>>>  1.15%  a.out[kernel.kallsyms]  [k] zram_slot_lock
>>>
>>> ptep_clear_flush() takes 5.36% CPU in the micro-benchmark
>>> swapping in/out a page mapped by only one process. If the
>>> page is mapped by multiple processes, typically, like more
>>> than 100 on a phone, the overhead would be much higher as
>>> we have to run tlb flush 100 times for one single page.
>>> Plus, tlb flush overhead will increase with the number
>>> of CPU cores due to the bad scalability of tlb shootdown
>>> in HW, so those ARM64 servers should expect much higher
>>> overhead.
>>>
>>> Further perf annonate shows 95% cpu time of ptep_clear_flush
>>> is actually used by the final dsb() to wait for the completion
>>> of tlb flush. This provides us a very good chance to leverage
>>> the existing batched tlb in kernel. The minimum modification
>>> is that we only send async tlbi in the first stage and we send
>>> dsb while we have to sync in the second stage.
>>>
>>> With the above simplest micro benchmark, collapsed time to
>>> finish the program decreases around 5%.
>>>
>>> Typical collapsed time w/o patch:
>>>  ~ # time taskset -c 4 ./a.out
>>>  0.21user 14.34system 0:14.69elapsed
>>> w/ patch:
>>>  ~ # time taskset -c 4 ./a.out
>>>  0.22user 13.45system 0:13.80elapsed
>>>
>>> Also, Yicong Yang added the following observation.
>>>   Tested with benchmark in the commit on Kunpeng920 arm64 server,
>>>   observed an improvement around 12.5% with command
>>>   `time ./swap_bench`.
>>>   w/o w/
>>>   real0m13.460s   0m11.771s
>>>   user0m0.248s0m0.279s
>>>   sys 0m12.039s   0m11.458s
>>>
>>>   Originally it's noticed a 16.99% overhead of ptep_clear_flush()
>>>   which has been eliminated by this patch:
>>>
>>>   [root@localhost yang]# perf record -- ./swap_bench && perf report
>>>   [...]
>>>   16.99%  swap_bench  [kernel.kallsyms]  [k] ptep_clear_flush
>>>
>>> Cc: Jonathan Corbet 
>>> Cc: Nadav Amit 
>>> Cc: Mel Gorman 
>>> Tested-by: Yicong Yang 
>>> Tested-by: Xin Hao 
>>> Signed-off-by: Barry Song 
>>> Signed-off-by: Yicong Yang 
>>> ---
>>>  .../features/vm/TLB/arch-support.txt  |  2 +-
>>>  arch/arm64/Kconfig|  1 +
>>>  arch/arm64/include/asm/tlbbatch.h |