Re: [next-20220215] WARNING at fs/iomap/buffered-io.c:75 with xfstests

2022-02-15 Thread Stephen Rothwell
Hi Sachin,

On Wed, 16 Feb 2022 12:55:02 +0530 Sachin Sant  wrote:
>
> While running xfstests on IBM Power10 logical partition (LPAR) booted
> with 5.17.0-rc4-next-20220215 following warning was seen:
> 
> The warning is seen when test tries to unmount the file system. This problem 
> is seen
> while running generic/475 sub test. Have attached captured messages during 
> the test
> run of generic/475.
> 
> xfstest is a recent add to upstream regression bucket. I don’t have any 
> previous data
> to attempt a git bisect. 

If you have time, could you test v5.17-rc4-2-gd567f5db412e (the commit
in Linus' tree that next-20220215 is based on) and if that OK, then a
bisect from that to 5.17.0-rc4-next-20220215 may be helpful.

Thanks for the report.
-- 
Cheers,
Stephen Rothwell


pgprbdXmGfx0u.pgp
Description: OpenPGP digital signature


[next-20220215] WARNING at fs/iomap/buffered-io.c:75 with xfstests

2022-02-15 Thread Sachin Sant
While running xfstests on IBM Power10 logical partition (LPAR) booted
with 5.17.0-rc4-next-20220215 following warning was seen:

[ 2547.384210] xfs filesystem being mounted at /mnt/scratch supports timestamps 
until 2038 (0x7fff)
[ 2547.389021] XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x64/0x98 
[xfs]" at daddr 0x2bc2c0 len 32 error 5
[ 2547.389020] XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x64/0x98 
[xfs]" at daddr 0x15cede0 len 32 error 5
[ 2547.389026] XFS (dm-0): metadata I/O error in 
"xfs_btree_read_buf_block.constprop.29+0xd0/0x110 [xfs]" at daddr 0xc60 len 8 
error 5
[ 2547.389032] XFS (dm-0): metadata I/O error in 
"xfs_btree_read_buf_block.constprop.29+0xd0/0x110 [xfs]" at daddr 0x3bffa30 len 
8 error 5
[ 2547.389120] XFS (dm-0): log I/O error -5
[ 2547.389135] XFS (dm-0): Metadata I/O Error (0x1) detected at 
xfs_trans_read_buf_map+0x31c/0x368 [xfs] (fs/xfs/xfs_trans_buf.c:296).  
Shutting down filesystem.
[ 2547.389195] XFS (dm-0): Please unmount the filesystem and rectify the 
problem(s)
[ 2547.662818] [ cut here ]
[ 2547.662832] WARNING: CPU: 24 PID: 2463718 at fs/iomap/buffered-io.c:75 
iomap_page_release+0x1b0/0x1e0
[ 2547.662840] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison 
dm_snapshot dm_bufio loop dm_flakey xfs libcrc32c dm_mod rfkill bonding sunrpc 
pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ext4 mbcache jbd2 
sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [last unloaded: 
scsi_debug]
[ 2547.662866] CPU: 24 PID: 2463718 Comm: umount Not tainted 
5.17.0-rc4-next-20220215 #1
[ 2547.662871] NIP:  c0565b80 LR: c0565aa0 CTR: c0565e40
[ 2547.662874] REGS: cf0035b0 TRAP: 0700   Not tainted  
(5.17.0-rc4-next-20220215)
[ 2547.662877] MSR:  8282b033   CR: 
44000228  XER: 2004
[ 2547.662885] CFAR: c0565ac8 IRQMASK: 0 
[ 2547.662885] GPR00: c037dd3c cf003850 c2a03300 
0001 
[ 2547.662885] GPR04: 0010 0001 c000df6e7bc0 
0001 
[ 2547.662885] GPR08: fffe  0010 
0228 
[ 2547.662885] GPR12: c0565e40 c00abfcfe680  
 
[ 2547.662885] GPR16:    
 
[ 2547.662885] GPR20:    
fffe 
[ 2547.662885] GPR24:   cf0038e0 
c000243db278 
[ 2547.662885] GPR28: 0012 c000df6e7ae0 0010 
c00c0240c100 
[ 2547.662920] NIP [c0565b80] iomap_page_release+0x1b0/0x1e0
[ 2547.662924] LR [c0565aa0] iomap_page_release+0xd0/0x1e0
[ 2547.662927] Call Trace:
[ 2547.662928] [cf003850] [cf003890] 0xcf003890 
(unreliable)
[ 2547.662932] [cf003890] [c037dd3c] 
truncate_cleanup_folio+0x7c/0x140
[ 2547.662937] [cf0038c0] [c037f068] 
truncate_inode_pages_range+0x148/0x5e0
[ 2547.662942] [cf003a40] [c04c2058] evict+0x248/0x270
[ 2547.662946] [cf003a80] [c04c20fc] dispose_list+0x7c/0xb0
[ 2547.662951] [cf003ac0] [c04c2314] evict_inodes+0x1e4/0x300
[ 2547.662955] [cf003b60] [c04922d0] 
generic_shutdown_super+0x70/0x1e0
[ 2547.662959] [cf003bd0] [c0492518] kill_block_super+0x38/0x90
[ 2547.662964] [cf003c00] [c04926e8] 
deactivate_locked_super+0x78/0xf0
[ 2547.662968] [cf003c30] [c04cde9c] cleanup_mnt+0xfc/0x1c0
[ 2547.662972] [cf003c80] [c0189448] task_work_run+0xf8/0x170
[ 2547.662976] [cf003cd0] [c0021b94] 
do_notify_resume+0x434/0x480
[ 2547.662981] [cf003d80] [c00338b0] 
interrupt_exit_user_prepare_main+0x1a0/0x260
[ 2547.662985] [cf003de0] [c0033d74] 
syscall_exit_prepare+0x74/0x150
[ 2547.662989] [cf003e10] [c000c658] 
system_call_common+0xf8/0x270
[ 2547.662994] --- interrupt: c00 at 0x7fffa6bb7dfc
[ 2547.662996] NIP:  7fffa6bb7dfc LR: 7fffa6bb7dcc CTR: 
[ 2547.662999] REGS: cf003e80 TRAP: 0c00   Not tainted  
(5.17.0-rc4-next-20220215)
[ 2547.663001] MSR:  8280f033   CR: 
28000202  XER: 
[ 2547.663010] IRQMASK: 0 
[ 2547.663010] GPR00: 0034 7fffc7ca6a70 7fffa6c87300 
 
[ 2547.663010] GPR04:  7fffc7ca6a88  
000166cf4e90 
[ 2547.663010] GPR08:    
 
[ 2547.663010] GPR12:  7fffa6f1d730  
 
[ 2547.663010] GPR16:    
 
[ 2547.663010] GPR20:    
 
[ 2547.663010] 

Re: [PATCH v2 1/2] crypto: vmx: Turn CRYPTO_DEV_VMX_ENCRYPT into tristate

2022-02-15 Thread Petr Vorel
Hi,

I kept CRYPTO_DEV_VMX_ENCRYPT in drivers/crypto/vmx/Kconfig,
but maybe I should have moved it into drivers/crypto/Kconfig.

It's not clear what is preferred.

Kind regards,
Petr


[PATCH v4 0/6] Improve KVM's interaction with CPU hotplug

2022-02-15 Thread Chao Gao
Changes from v3->v4:
 - rebased to the lastest kvm/next branch.
 - add Sean's reviewed-by tags
 - add Marc's patch to simplify ARM's cpu hotplug logic in KVM

Changes from v2->v3:
 - rebased to the latest kvm/next branch. 
 - patch 1: rename {svm,vmx}_check_processor_compat to follow the name
convention
 - patch 3: newly added to provide more information when hardware enabling
fails
 - patch 4: reset hardware_enable_failed if hardware enabling fails. And
remove redundent kernel log.
 - patch 5: add a pr_err() for setup_vmcs_config() path.

Changes from v1->v2: (all comments/suggestions on v1 are from Sean, thanks)
 - Merged v1's patch 2 into patch 1, and v1's patch 5 into patch 6.
 - Use static_call for check_processor_compatibility().
 - Generate patch 2 with "git revert" and do manual changes based on that.
 - Loosen the WARN_ON() in kvm_arch_check_processor_compat() instead of
   removing it.
 - KVM always prevent incompatible CPUs from being brought up regardless of
   running VMs.
 - Use pr_warn instead of pr_info to emit logs when KVM finds offending
   CPUs.

KVM registers its CPU hotplug callback to CPU starting section. And in the
callback, KVM enables hardware virtualization on hotplugged CPUs if any VM
is running on existing CPUs.

There are two problems in the process:
1. KVM doesn't do compatibility checks before enabling hardware
virtualization on hotplugged CPUs. This may cause #GP if VMX isn't
supported or vmentry failure if some in-use VMX features are missing on
hotplugged CPUs. Both break running VMs.
2. Callbacks in CPU STARTING section cannot fail. So, even if KVM finds
some incompatible CPUs, its callback cannot block CPU hotplug.

This series improves KVM's interaction with CPU hotplug to avoid
incompatible CPUs breaking running VMs. Following changes are made:

1. move KVM's CPU hotplug callback to ONLINE section (suggested by Thomas)
2. do compatibility checks on hotplugged CPUs.
3. abort onlining incompatible CPUs

This series is a follow-up to the discussion about KVM and CPU hotplug
https://lore.kernel.org/lkml/3d3296f0-9245-40f9-1b5a-efffdb082...@redhat.com/T/

Note: this series is tested only on Intel systems.

Chao Gao (4):
  KVM: x86: Move check_processor_compatibility from init ops to runtime
ops
  Partially revert "KVM: Pass kvm_init()'s opaque param to additional
arch funcs"
  KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
  KVM: Do compatibility checks on hotplugged CPUs

Marc Zyngier (1):
  KVM: arm64: Simplify the CPUHP logic

Sean Christopherson (1):
  KVM: Provide more information in kernel log if hardware enabling fails

 arch/arm64/kvm/arch_timer.c| 27 ---
 arch/arm64/kvm/arm.c   |  6 ++-
 arch/arm64/kvm/vgic/vgic-init.c| 19 +---
 arch/mips/kvm/mips.c   |  2 +-
 arch/powerpc/kvm/powerpc.c |  2 +-
 arch/riscv/kvm/main.c  |  2 +-
 arch/s390/kvm/kvm-s390.c   |  2 +-
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h|  2 +-
 arch/x86/kvm/svm/svm.c |  4 +-
 arch/x86/kvm/vmx/evmcs.c   |  2 +-
 arch/x86/kvm/vmx/evmcs.h   |  2 +-
 arch/x86/kvm/vmx/vmx.c | 22 +
 arch/x86/kvm/x86.c | 16 +--
 include/kvm/arm_arch_timer.h   |  4 ++
 include/kvm/arm_vgic.h |  4 ++
 include/linux/cpuhotplug.h |  5 +-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c| 73 +++---
 19 files changed, 107 insertions(+), 90 deletions(-)

-- 
2.25.1



Re: [PATCH v4 2/6] Partially revert "KVM: Pass kvm_init()'s opaque param to additional arch funcs"

2022-02-15 Thread Anup Patel
On Wed, Feb 16, 2022 at 8:46 AM Chao Gao  wrote:
>
> This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
> param to additional arch funcs") remove opaque from
> kvm_arch_check_processor_compat because no one uses this opaque now.
> Address conflicts for ARM (due to file movement) and manually handle RISC-V
> which comes after the commit.
>
> And changes about kvm_arch_hardware_setup() in original commit are still
> needed so they are not reverted.
>
> Signed-off-by: Chao Gao 
> Reviewed-by: Sean Christopherson 

For KVM RISC-V:
Acked-by: Anup Patel 

Regards,
Anup


> ---
>  arch/arm64/kvm/arm.c   |  2 +-
>  arch/mips/kvm/mips.c   |  2 +-
>  arch/powerpc/kvm/powerpc.c |  2 +-
>  arch/riscv/kvm/main.c  |  2 +-
>  arch/s390/kvm/kvm-s390.c   |  2 +-
>  arch/x86/kvm/x86.c |  2 +-
>  include/linux/kvm_host.h   |  2 +-
>  virt/kvm/kvm_main.c| 16 +++-
>  8 files changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ecc5958e27fe..0165cf3aac3a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -73,7 +73,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> return 0;
>  }
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..092d09fb6a7e 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> return 0;
>  }
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2ad0ccd202d5..30c817f3fa0c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -423,7 +423,7 @@ int kvm_arch_hardware_setup(void *opaque)
> return 0;
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> return kvmppc_core_check_processor_compat();
>  }
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 2e5ca43c8c49..992877e78393 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
> return -EINVAL;
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> return 0;
>  }
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 577f1ead6a51..0053b81c6b02 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -252,7 +252,7 @@ int kvm_arch_hardware_enable(void)
> return 0;
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> return 0;
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b484ed61f37..ffb88f0b7265 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11509,7 +11509,7 @@ void kvm_arch_hardware_unsetup(void)
> static_call(kvm_x86_hardware_unsetup)();
>  }
>
> -int kvm_arch_check_processor_compat(void *opaque)
> +int kvm_arch_check_processor_compat(void)
>  {
> struct cpuinfo_x86 *c = _data(smp_processor_id());
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..2ad78e729bf7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1413,7 +1413,7 @@ int kvm_arch_hardware_enable(void);
>  void kvm_arch_hardware_disable(void);
>  int kvm_arch_hardware_setup(void *opaque);
>  void kvm_arch_hardware_unsetup(void);
> -int kvm_arch_check_processor_compat(void *opaque);
> +int kvm_arch_check_processor_compat(void);
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
>  bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 83c57bcc6eb6..ee47d33d69e1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5643,22 +5643,14 @@ void kvm_unregister_perf_callbacks(void)
>  }
>  #endif
>
> -struct kvm_cpu_compat_check {
> -   void *opaque;
> -   int *ret;
> -};
> -
> -static void check_processor_compat(void *data)
> +static void check_processor_compat(void *rtn)
>  {
> -   struct kvm_cpu_compat_check *c = data;
> -
> -   *c->ret = kvm_arch_check_processor_compat(c->opaque);
> +   *(int *)rtn = kvm_arch_check_processor_compat();
>  }
>
>  int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
>   struct module *module)
>  {
> -   struct kvm_cpu_compat_check c;
> int r;
> int cpu;
>
> @@ -5686,10 +5678,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, 
> unsigned vcpu_align,
> if (r < 0)
> goto out_free_1;
>
> -   c.ret = 
> -   

[PATCH v4 2/6] Partially revert "KVM: Pass kvm_init()'s opaque param to additional arch funcs"

2022-02-15 Thread Chao Gao
This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
param to additional arch funcs") remove opaque from
kvm_arch_check_processor_compat because no one uses this opaque now.
Address conflicts for ARM (due to file movement) and manually handle RISC-V
which comes after the commit.

And changes about kvm_arch_hardware_setup() in original commit are still
needed so they are not reverted.

Signed-off-by: Chao Gao 
Reviewed-by: Sean Christopherson 
---
 arch/arm64/kvm/arm.c   |  2 +-
 arch/mips/kvm/mips.c   |  2 +-
 arch/powerpc/kvm/powerpc.c |  2 +-
 arch/riscv/kvm/main.c  |  2 +-
 arch/s390/kvm/kvm-s390.c   |  2 +-
 arch/x86/kvm/x86.c |  2 +-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c| 16 +++-
 8 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..0165cf3aac3a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -73,7 +73,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
return 0;
 }
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..092d09fb6a7e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
return 0;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..30c817f3fa0c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -423,7 +423,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
return kvmppc_core_check_processor_compat();
 }
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 2e5ca43c8c49..992877e78393 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
return 0;
 }
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 577f1ead6a51..0053b81c6b02 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -252,7 +252,7 @@ int kvm_arch_hardware_enable(void)
return 0;
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b484ed61f37..ffb88f0b7265 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11509,7 +11509,7 @@ void kvm_arch_hardware_unsetup(void)
static_call(kvm_x86_hardware_unsetup)();
 }
 
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
struct cpuinfo_x86 *c = _data(smp_processor_id());
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f11039944c08..2ad78e729bf7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1413,7 +1413,7 @@ int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
 int kvm_arch_hardware_setup(void *opaque);
 void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void *opaque);
+int kvm_arch_check_processor_compat(void);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 83c57bcc6eb6..ee47d33d69e1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5643,22 +5643,14 @@ void kvm_unregister_perf_callbacks(void)
 }
 #endif
 
-struct kvm_cpu_compat_check {
-   void *opaque;
-   int *ret;
-};
-
-static void check_processor_compat(void *data)
+static void check_processor_compat(void *rtn)
 {
-   struct kvm_cpu_compat_check *c = data;
-
-   *c->ret = kvm_arch_check_processor_compat(c->opaque);
+   *(int *)rtn = kvm_arch_check_processor_compat();
 }
 
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module)
 {
-   struct kvm_cpu_compat_check c;
int r;
int cpu;
 
@@ -5686,10 +5678,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
if (r < 0)
goto out_free_1;
 
-   c.ret = 
-   c.opaque = opaque;
for_each_online_cpu(cpu) {
-   smp_call_function_single(cpu, check_processor_compat, , 1);
+   smp_call_function_single(cpu, check_processor_compat, , 1);
if (r < 0)
goto out_free_2;
}
-- 
2.25.1



Re: [PATCH v3 07/10] powerpc/vas: Add paste address mmap fault handler

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 13:37 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:59 am:
> > The user space opens VAS windows and issues NX requests by pasting
> > CRB on the corresponding paste address mmap. When the system looses
> 
> s/loose/lose/g throughout the series.
> 
> > credits due to core removal, the kernel has to close the window in
> > the hypervisor
> 
> By the way what if the kernel does not close the window and we try
> to access the memory? The hypervisor will inject faults?

The requests on the already opened windows will be successful even the
LPAR lost credits (due to core removal). But the hypervisor expects the
LPAR to behave like good citizen and give up resources with core
removal. So we do not see any issue with current upstream code for
DLPAR removal.

But we will have an issue with the migration. The hypervisor knows the
actulal number of credits assigned to the source LPAR before migration.
So assigns the same number on the destination. 

> 
> > and make the window inactive by unmapping this paste
> > address. Also the OS has to handle NX request page faults if the
> > user
> > space issue NX requests.
> > 
> > This handler remap the new paste address with the same VMA when the
> > window is active again (due to core add with DLPAR). Otherwise
> > returns paste failure.
> 
> This patch should come before (or combined with) the patch that zaps 
> PTEs. Putting it afterwards is logically backward. Even if you don't
> really expect the series to half work in a half bisected state, it
> just makes the changes easier to follow.
> 
> Thanks,
> Nick
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/book3s/vas-api.c | 60
> > +
> >  1 file changed, 60 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/book3s/vas-api.c
> > b/arch/powerpc/platforms/book3s/vas-api.c
> > index 2d06bd1b1935..5ceba75c13eb 100644
> > --- a/arch/powerpc/platforms/book3s/vas-api.c
> > +++ b/arch/powerpc/platforms/book3s/vas-api.c
> > @@ -351,6 +351,65 @@ static int coproc_release(struct inode *inode,
> > struct file *fp)
> > return 0;
> >  }
> >  
> > +/*
> > + * This fault handler is invoked when the VAS/NX generates page
> > fault on
> > + * the paste address.
> 
> The core generates the page fault here, right? paste destination is 
> translated by the core MMU (the instruction is executed in the core,
> afterall).

correct. Will update. 
> 
> > Happens if the kernel closes window in hypervisor
> > + * (on PowerVM) due to lost credit or the paste address is not
> > mapped.
> 
> Call it pseries everywhere if you're talking about the API and Linux
> code, rather than some specific quirk or issue of of the PowerVM
> implementation.
> 
> > + */
> > +static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
> > +{
> > +   struct vm_area_struct *vma = vmf->vma;
> > +   struct file *fp = vma->vm_file;
> > +   struct coproc_instance *cp_inst = fp->private_data;
> > +   struct vas_window *txwin;
> > +   u64 paste_addr;
> > +   int ret;
> > +
> > +   /*
> > +* window is not opened. Shouldn't expect this error.
> > +*/
> > +   if (!cp_inst || !cp_inst->txwin) {
> > +   pr_err("%s(): No send window open?\n", __func__);
> 
> Probably don't put PR_ERROR logs with question marks in them. The
> administrator knows less than you to answer the question.
> 
> "Unexpected fault on paste address with TX window closed" etc.
> 
> Then you don't need the comment either because the message explains
> it.
> 
> > +   return VM_FAULT_SIGBUS;
> > +   }
> > +
> > +   txwin = cp_inst->txwin;
> > +   /*
> > +* Fault is coming due to missing from the original mmap.
> 
> Rather than a vague comment like this (which we already know a fault 
> comes from a missing or insufficient PTE), you could point to exactly
> the code which zaps the PTEs.
> 
> > +* Can happen only when the window is closed due to lost
> > +* credit before mmap() or the user space issued NX request
> > +* without mapping.
> > +*/
> > +   if (txwin->task_ref.vma != vmf->vma) {
> > +   pr_err("%s(): No previous mapping with paste
> > address\n",
> > +   __func__);
> > +   return VM_FAULT_SIGBUS;
> > +   }
> > +
> > +   mutex_lock(>task_ref.mmap_mutex);
> > +   /*
> > +* The window may be inactive due to lost credit (Ex: core
> > +* removal with DLPAR). When the window is active again when
> > +* the credit is available, remap with the new paste address.
> 
> Remap also typically means mapping the same physical memory at a 
> different virtual address. So when you say remap with the new paste
> address, in Linux mm terms that means you're mapping the same window
> at a different virtual address.
> 
> But you're faulting a different physical address into the same
> virtual.

Yes, we get different physical address and this handler maps the this
paste address at the existing virtual address. will 

Re: [PATCH v3 04/10] powerpc/pseries/vas: Reopen windows with DLPAR core add

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 13:08 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:56 am:
> > VAS windows can be closed in the hypervisor due to lost credits
> > when the core is removed. If these credits are available later
> > for core add, reopen these windows and set them active. When the
> > kernel sees page fault on the paste address, it creates new mapping
> > on the new paste address. Then the user space can continue to use
> > these windows and send HW compression requests to NX successfully.
> 
> Any reason to put this before the close windows patch? It would be
> more logical to put it afterwards AFAIKS.

reconfig_open_windows() is just to reopen and set the status flag when
windows are closed. I thought adding handler first before closing /
unmap helps during git bisect. 

I can change. 

> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/include/asm/vas.h  |  16 +++
> >  arch/powerpc/platforms/book3s/vas-api.c |   1 +
> >  arch/powerpc/platforms/pseries/vas.c| 144
> > 
> >  arch/powerpc/platforms/pseries/vas.h|   8 +-
> >  4 files changed, 163 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/vas.h
> > b/arch/powerpc/include/asm/vas.h
> > index 57573d9c1e09..f1efe86563cc 100644
> > --- a/arch/powerpc/include/asm/vas.h
> > +++ b/arch/powerpc/include/asm/vas.h
> > @@ -29,6 +29,19 @@
> >  #define VAS_THRESH_FIFO_GT_QTR_FULL2
> >  #define VAS_THRESH_FIFO_GT_EIGHTH_FULL 3
> >  
> > +/*
> > + * VAS window status
> > + */
> > +#define VAS_WIN_ACTIVE 0x0 /* Used in platform
> > independent */
> > +   /* vas mmap() */
> > +/* The hypervisor returns these values */
> > +#define VAS_WIN_CLOSED 0x0001
> > +#define VAS_WIN_INACTIVE   0x0002 /* Inactive due to HW
> > failure */
> > +#define VAS_WIN_MOD_IN_PROCESS 0x0003 /* Process of being
> > modified, */
> 
> While you're moving these and adding a comment, it would be good to 
> list what hcalls they are relevant to. H_QUERY_VAS_WINDOW (which is
> not
> used anywhere yet?) These are also a 1-byte field, so '0x00', '0x01'
> etc
> would be more appropriate.

Yes, these status bits are assigned by the hypervisor and we are not
using / supporting them right now. I just list them as defined in PAPR
NX HCALLs.

For example: OS can modify the existing window with modify HCALL when
the window is used. During this time, the hypervisor return this status
with window query HCALL. 
 
> 
> > +  /* deallocated, or quiesced
> > */
> > +/* Linux status bits */
> > +#define VAS_WIN_NO_CRED_CLOSE  0x0004 /* Window is closed
> > due to */
> > +  /* lost credit */
> 
> This is mixing a user defined bit field with hcall API value field.
> You
> also AFAIKS as yet don't fill in the hypervisor status anywhere.
> 
> I would make this it's own field entirely. A boolean would be nice,
> if
> possible.

Yes, HV status bits are not used here. 

In case if the window status is reported thorugh sysfs in future,
thought that it will be simpler to have one status flag. 

I can add 'hv_status' for the hypervisor status flag in pseries_vas-
window struct and 'status' for linux in vas_window struct. 

We also need one more status for migration. So boolean may not be used.

> 
> >  /*
> >   * Get/Set bit fields
> >   */
> > @@ -59,6 +72,8 @@ struct vas_user_win_ref {
> > struct pid *pid;/* PID of owner */
> > struct pid *tgid;   /* Thread group ID of owner */
> > struct mm_struct *mm;   /* Linux process mm_struct */
> > +   struct mutex mmap_mutex;/* protects paste address mmap() */
> > +   /* with DLPAR close/open
> > windows */
> >  };
> >  
> >  /*
> > @@ -67,6 +82,7 @@ struct vas_user_win_ref {
> >  struct vas_window {
> > u32 winid;
> > u32 wcreds_max; /* Window credits */
> > +   u32 status;
> > enum vas_cop_type cop;
> > struct vas_user_win_ref task_ref;
> > char *dbgname;
> > diff --git a/arch/powerpc/platforms/book3s/vas-api.c
> > b/arch/powerpc/platforms/book3s/vas-api.c
> > index 4d82c92ddd52..2b0ced611f32 100644
> > --- a/arch/powerpc/platforms/book3s/vas-api.c
> > +++ b/arch/powerpc/platforms/book3s/vas-api.c
> > @@ -316,6 +316,7 @@ static int coproc_ioc_tx_win_open(struct file
> > *fp, unsigned long arg)
> > return PTR_ERR(txwin);
> > }
> >  
> > +   mutex_init(>task_ref.mmap_mutex);
> > cp_inst->txwin = txwin;
> >  
> > return 0;
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index 2ef56157634f..d9ff73d7704d 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -501,6 +501,7 @@ static int __init get_vas_capabilities(u8 feat,
> > enum vas_cop_feat_type type,
> > memset(vcaps, 0, 

Re: [PATCH v3 06/10] powerpc/vas: Map paste address only if window is active

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 13:20 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:58 am:
> > The paste address mapping is done with mmap() after the window is
> > opened with ioctl. But the window can be closed due to lost credit
> > due to core removal before mmap(). So if the window is not active,
> > return mmap() failure with -EACCES and expects the user space
> > reissue
> > mmap() when the window is active or open new window when the credit
> > is available.
> > 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/book3s/vas-api.c | 21 -
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/book3s/vas-api.c
> > b/arch/powerpc/platforms/book3s/vas-api.c
> > index a63fd48e34a7..2d06bd1b1935 100644
> > --- a/arch/powerpc/platforms/book3s/vas-api.c
> > +++ b/arch/powerpc/platforms/book3s/vas-api.c
> > @@ -379,10 +379,27 @@ static int coproc_mmap(struct file *fp,
> > struct vm_area_struct *vma)
> > return -EACCES;
> > }
> >  
> > +   /*
> > +* The initial mapping is done after the window is opened
> > +* with ioctl. But this window might have been closed
> > +* due to lost credit (core removal on PowerVM) before mmap().
> 
> What does "initial mapping" mean?
> 
> mapping ~= mmap, in kernel speak.

yes, the initial mapping is done with the actual mmap() call. 
> 
> You will have to differentiate the concepts.
> 
> > +* So if the window is not active, return mmap() failure
> > +* with -EACCES and expects the user space reconfigure (mmap)
> > +* window when it is active again or open new window when
> > +* the credit is available.
> > +*/
> > +   mutex_lock(>task_ref.mmap_mutex);
> > +   if (txwin->status != VAS_WIN_ACTIVE) {
> > +   pr_err("%s(): Window is not active\n", __func__);
> > +   rc = -EACCES;
> > +   goto out;
> > +   }
> > +
> > paste_addr = cp_inst->coproc->vops->paste_addr(txwin);
> > if (!paste_addr) {
> > pr_err("%s(): Window paste address failed\n",
> > __func__);
> > -   return -EINVAL;
> > +   rc = -EINVAL;
> > +   goto out;
> > }
> >  
> > pfn = paste_addr >> PAGE_SHIFT;
> > @@ -401,6 +418,8 @@ static int coproc_mmap(struct file *fp, struct
> > vm_area_struct *vma)
> >  
> > txwin->task_ref.vma = vma;
> >  
> > +out:
> > +   mutex_unlock(>task_ref.mmap_mutex);
> 
> If the hypervisor can revoke a window at any point with DLPAR, it's
> not 
> clear *why* this is needed. The hypervisor could cause your window
> to 
> close right after this mmap() returns, right? So an explanation for 
> exactly what this patch is needed for beyond that would help.

Yes, the window can be closed by OS due to DLPAR after the mmap()
returns successfully which is a normal case - paste instruction failure
 until the window is reopened again.

But ths patch is mainly for window open by user space and dlpar happens
before the user space issue mmap().

I will add more description in the commit log. 

Thanks
Haren

> 
> Thanks,
> Nick



Re: [PATCH v3 05/10] powerpc/pseries/vas: Close windows with DLPAR core removal

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 13:17 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:57 am:
> > The hypervisor reduces the available credits if the core is removed
> > from the LPAR. So there is possibility of using excessive credits
> > (windows) in the LPAR and the hypervisor expects the system to
> > close
> > the excessive windows. Even though the user space can continue to
> > use
> > these windows to send compression requests to NX, the hypervisor
> > expects
> > the LPAR to reduce these windows usage so that NX load can be
> > equally
> > distributed across all LPARs in the system.
> > 
> > When the DLPAR notifier is received, get the new VAS capabilities
> > from
> > the hypervisor and close the excessive windows in the hypervisor.
> > Also
> > the kernel unmaps the paste address so that the user space receives
> > paste
> > failure until these windows are active with the later DLPAR (core
> > add).
> 
> The changelog needs work. Unmapping the window and the ramifications
> of
> that needs more description here or in comments.

Thanks will change. 
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/include/asm/vas.h  |   1 +
> >  arch/powerpc/platforms/book3s/vas-api.c |   2 +
> >  arch/powerpc/platforms/pseries/vas.c| 117
> > ++--
> >  arch/powerpc/platforms/pseries/vas.h|   1 +
> >  4 files changed, 112 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/vas.h
> > b/arch/powerpc/include/asm/vas.h
> > index f1efe86563cc..ddc05a8fc2e3 100644
> > --- a/arch/powerpc/include/asm/vas.h
> > +++ b/arch/powerpc/include/asm/vas.h
> > @@ -74,6 +74,7 @@ struct vas_user_win_ref {
> > struct mm_struct *mm;   /* Linux process mm_struct */
> > struct mutex mmap_mutex;/* protects paste address mmap() */
> > /* with DLPAR close/open
> > windows */
> > +   struct vm_area_struct *vma; /* Save VMA and used in DLPAR ops
> > */
> >  };
> >  
> >  /*
> > diff --git a/arch/powerpc/platforms/book3s/vas-api.c
> > b/arch/powerpc/platforms/book3s/vas-api.c
> > index 2b0ced611f32..a63fd48e34a7 100644
> > --- a/arch/powerpc/platforms/book3s/vas-api.c
> > +++ b/arch/powerpc/platforms/book3s/vas-api.c
> > @@ -399,6 +399,8 @@ static int coproc_mmap(struct file *fp, struct
> > vm_area_struct *vma)
> > pr_devel("%s(): paste addr %llx at %lx, rc %d\n", __func__,
> > paste_addr, vma->vm_start, rc);
> >  
> > +   txwin->task_ref.vma = vma;
> > +
> > return rc;
> >  }
> >  
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index d9ff73d7704d..75ccd0a599ec 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -370,13 +370,28 @@ static struct vas_window
> > *vas_allocate_window(int vas_id, u64 flags,
> > if (rc)
> > goto out_free;
> >  
> > -   vas_user_win_add_mm_context(>vas_win.task_ref);
> > txwin->win_type = cop_feat_caps->win_type;
> > mutex_lock(_pseries_mutex);
> > -   list_add(>win_list, >list);
> > +   /*
> > +* Possible to loose the acquired credit with DLPAR core
> 
> s/loose/lose/g
> 
> > +* removal after the window is opened. So if there are any
> > +* closed windows (means with lost credits), do not give new
> > +* window to user space. New windows will be opened only
> > +* after the existing windows are reopened when credits are
> > +* available.
> > +*/
> > +   if (!caps->close_wins) {
> > +   list_add(>win_list, >list);
> > +   caps->num_wins++;
> > +   mutex_unlock(_pseries_mutex);
> > +   vas_user_win_add_mm_context(>vas_win.task_ref);
> > +   return >vas_win;
> > +   }
> > mutex_unlock(_pseries_mutex);
> >  
> > -   return >vas_win;
> > +   put_vas_user_win_ref(>vas_win.task_ref);
> > +   rc = -EBUSY;
> > +   pr_err("No credit is available to allocate window\n");
> >  
> >  out_free:
> > /*
> > @@ -439,14 +454,24 @@ static int vas_deallocate_window(struct
> > vas_window *vwin)
> >  
> > caps = [win->win_type].caps;
> > mutex_lock(_pseries_mutex);
> > -   rc = deallocate_free_window(win);
> > -   if (rc) {
> > -   mutex_unlock(_pseries_mutex);
> > -   return rc;
> > -   }
> > +   /*
> > +* VAS window is already closed in the hypervisor when
> > +* lost the credit. So just remove the entry from
> > +* the list, remove task references and free vas_window
> > +* struct.
> > +*/
> > +   if (win->vas_win.status & VAS_WIN_NO_CRED_CLOSE) {
> > +   rc = deallocate_free_window(win);
> > +   if (rc) {
> > +   mutex_unlock(_pseries_mutex);
> > +   return rc;
> > +   }
> > +   } else
> > +   vascaps[win->win_type].close_wins--;
> >  
> > list_del(>win_list);
> > atomic_dec(>used_creds);
> > +   

Re: [PATCH kernel 2/3] powerpc/llvm: Sample config for LLVM LTO

2022-02-15 Thread Alexey Kardashevskiy




On 2/12/22 11:05, Nick Desaulniers wrote:

On Thu, Feb 10, 2022 at 6:31 PM Alexey Kardashevskiy  wrote:


The config is a copy of ppc64_defconfig with a few tweaks. This could be
a smaller config to merge into ppc64_defconfig but unfortunately
merger does not allow disabling already enabled options.


Cool series!



This is a command line to compile the kernel using the upstream llvm:

make -j64 O=/home/aik/pbuild/kernels-llvm/ \
  "KCFLAGS=-Wmissing-braces -Wno-array-bounds" \
  ARCH=powerpc LLVM_IAS=1 ppc64le_lto_defconfig CC=clang LLVM=1


That command line invocation is kind of a mess, and many things
shouldn't be necessary.

O= is just noise; if folks are doing in tree builds then that doesn't
add anything meaningful.
KCFLAGS= why? I know -Warray-bounds is being worked on actively, but
do we have instances of -Wmissing-braces at the moment? Let's get
those fixed up.
LLVM_IAS=1 is implied by LLVM=1.
CC=clang is implied by LLVM=1

why add a new config? I think it would be simpler to just show command
line invocations of `./scripts/config -e` and `make`. No new config
required.




I should have added "RFC" in this one as the purpose of the patch is to 
show what works right now and not for actual submission.





Forces CONFIG_BTRFS_FS=y to make CONFIG_ZSTD_COMPRESS=y to fix:
ld.lld: error: linking module flags 'Code Model': IDs have conflicting values 
in 'lib/built-in.a(entropy_common.o at 5332)' and 'ld-temp.o'

because modules are linked with -mcmodel=large but the kernel uses 
-mcmodel=medium


Please file a bug about this.
https://github.com/ClangBuiltLinux/linux/issues



Enables CONFIG_USERFAULTFD=y as otherwise vm_userfaultfd_ctx becomes
0 bytes long and clang sanitizer crashes as
https://bugs.llvm.org/show_bug.cgi?id=500375




The above hyperlink doesn't work for me. Upstream llvm just moved from
bugzilla to github issue tracker.



aah this is the correct one:
https://bugs.llvm.org/show_bug.cgi?id=50037



https://github.com/llvm/llvm-project/issues


oh ok.



Disables CONFIG_FTR_FIXUP_SELFTEST as it uses FTR_SECTION_ELSE with
conditional branches. There are other places like this and the following
patches address that.

Disables CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT as CONFIG_HAS_LTO_CLANG
depends on it being disabled. In order to avoid disabling way too many
options (like DYNAMIC_FTRACE/FUNCTION_TRACER), this converts
FTRACE_MCOUNT_USE_RECORDMCOUNT from def_bool to bool.

Note that even with this config there is a good chance that LTO
is going to fail linking vmlinux because of the "bc" problem.


I think rather than adding a new config with LTO enabled and a few
things turned off, it would be better to not allow LTO to be
selectable if those things are turned on, until the combination of the
two are fixed.


Well, if I want people to try this thing, I kinda need to provide an 
easy way to allow LTO. The new config seemed the easiest (== the 
shortest) :)


Re: [PATCH v3 03/10] powerpc/pseries/vas: Save LPID in pseries_vas_window struct

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 12:41 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:55 am:
> > The kernel sets the VAS window with partition PID when is opened in
> > the hypervisor. During DLPAR operation, windows can be closed and
> > reopened in the hypervisor when the credit is available. So saves
> > this PID in pseries_vas_window struct when the window is opened
> > initially and reuse it later during DLPAR operation.
> 
> This probably shouldn't be called lpid, while you're changing it.
> "partition PID" and "LPAR PID" is also confusing. I know the name
> somewhat comes from the specifiction, but pid/PID would be fine,
> it's clear we are talking about "this LPAR" when in pseries code.
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/pseries/vas.c | 7 ---
> >  arch/powerpc/platforms/pseries/vas.h | 1 +
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index d2c8292bfb33..2ef56157634f 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -107,7 +107,6 @@ static int h_deallocate_vas_window(u64 winid)
> >  static int h_modify_vas_window(struct pseries_vas_window *win)
> >  {
> > long rc;
> > -   u32 lpid = mfspr(SPRN_PID);
> >  
> > /*
> >  * AMR value is not supported in Linux VAS implementation.
> > @@ -115,7 +114,7 @@ static int h_modify_vas_window(struct
> > pseries_vas_window *win)
> >  */
> > do {
> > rc = plpar_hcall_norets(H_MODIFY_VAS_WINDOW,
> > -   win->vas_win.winid, lpid, 0,
> > +   win->vas_win.winid, win->lpid,
> > 0,
> > VAS_MOD_WIN_FLAGS, 0);
> >  
> > rc = hcall_return_busy_check(rc);
> > @@ -125,7 +124,7 @@ static int h_modify_vas_window(struct
> > pseries_vas_window *win)
> > return 0;
> >  
> > pr_err("H_MODIFY_VAS_WINDOW error: %ld, winid %u lpid %u\n",
> > -   rc, win->vas_win.winid, lpid);
> > +   rc, win->vas_win.winid, win->lpid);
> > return -EIO;
> >  }
> >  
> > @@ -338,6 +337,8 @@ static struct vas_window
> > *vas_allocate_window(int
> > vas_id, u64 flags,
> > }
> > }
> >  
> > +   txwin->lpid = mfspr(SPRN_PID);
> > +
> > /*
> >  * Allocate / Deallocate window hcalls and setup / free IRQs
> >  * have to be protected with mutex.
> > diff --git a/arch/powerpc/platforms/pseries/vas.h
> > b/arch/powerpc/platforms/pseries/vas.h
> > index fa7ce74f1e49..0538760d13be 100644
> > --- a/arch/powerpc/platforms/pseries/vas.h
> > +++ b/arch/powerpc/platforms/pseries/vas.h
> > @@ -115,6 +115,7 @@ struct pseries_vas_window {
> > u64 domain[6];  /* Associativity domain Ids */
> > /* this window is allocated */
> > u64 util;
> > +   u32 lpid;
> 
> Comment could be "PID associated with this window".   

yes, will add this comment.

> 
> BTW, is the TID parameter deprecated? Doesn't seem that we use that.

Right, tpid is deprecated on p10 and we are not using it.

Thanks
Haren

> 
> Thanks,
> Nick



Re: [PATCH v3 02/10] powerpc/pseries/vas: Add notifier for DLPAR core removal/add

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 12:27 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:54 am:
> > The hypervisor assigns credits for each LPAR based on number of
> > cores configured in that system. So expects to release credits
> > (means windows) when the core is removed. This patch adds notifier
> > for core removal/add so that the OS closes windows if the system
> > looses credits due to core removal and reopen windows when the
> > credits available later.
> 
> This could be improved. As far as I can tell,
> 
>  The hypervisor assigns vas credits (windows) for each LPAR based on
> the 
>  number of cores configured in that system. The OS is expected to 
>  release credits when cores are removed, and may allocate more when 
>  cores are added.
> 
> Or can you only re-use credits that you previously lost?

yes, reopen windows / re-use credits when the previously lost credits
are available. 
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/pseries/vas.c | 37
> > 
> >  1 file changed, 37 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index c0737379cc7b..d2c8292bfb33 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -538,6 +538,39 @@ static int __init get_vas_capabilities(u8
> > feat, enum vas_cop_feat_type type,
> > return 0;
> >  }
> >  
> > +/*
> > + * Total number of default credits available (target_credits)
> > + * in LPAR depends on number of cores configured. It varies based
> > on
> > + * whether processors are in shared mode or dedicated mode.
> > + * Get the notifier when CPU configuration is changed with DLPAR
> > + * operation so that get the new target_credits (vas default
> > capabilities)
> > + * and then update the existing windows usage if needed.
> > + */
> > +static int pseries_vas_notifier(struct notifier_block *nb,
> > +   unsigned long action, void *data)
> > +{
> > +   struct of_reconfig_data *rd = data;
> > +   struct device_node *dn = rd->dn;
> > +   const __be32 *intserv = NULL;
> > +   int len, rc = 0;
> > +
> > +   if ((action == OF_RECONFIG_ATTACH_NODE) ||
> > +   (action == OF_RECONFIG_DETACH_NODE))
> 
> I suppose the OF notifier is the way to do it (cc Nathan).

Using notifier here. registering notifier
with of_reconfig_notifier_register() as in other places (hotplug-
cpu.c pseries_smp_notifier())

> 
> Could this patch be folded in with where it acually does something?
> It 
> makes it easier to review and understand how the notifier is used.

Added this notifier as a seperate patch to make it smaller. Sure, I can
include this patch in 'Add reconfig_close/open_windows() patch'. 
> 
> 
> > +   intserv = of_get_property(dn, "ibm,ppc-interrupt-
> > server#s",
> > + );
> > +   /*
> > +* Processor config is not changed
> > +*/
> > +   if (!intserv)
> > +   return NOTIFY_OK;
> > +
> > +   return rc;
> > +}
> > +
> > +static struct notifier_block pseries_vas_nb = {
> > +   .notifier_call = pseries_vas_notifier,
> > +};
> > +
> >  static int __init pseries_vas_init(void)
> >  {
> > struct hv_vas_cop_feat_caps *hv_cop_caps;
> > @@ -591,6 +624,10 @@ static int __init pseries_vas_init(void)
> > goto out_cop;
> > }
> >  
> > +   /* Processors can be added/removed only on LPAR */
> 
> What does this comment mean? DLPAR?

I will remve it, basically trying to say that this notifier is called
when core is removed / added. 

Thanks
haren

> 
> Thanks,
> Nick
> 
> > +   if (copypaste_feat && firmware_has_feature(FW_FEATURE_LPAR))
> > +   of_reconfig_notifier_register(_vas_nb);
> > +
> > pr_info("GZIP feature is available\n");
> >  
> >  out_cop:
> > -- 
> > 2.27.0
> > 
> > 
> > 



Re: [PATCH v1 1/2] cma: factor out minimum alignment requirement

2022-02-15 Thread Rob Herring
On Mon, Feb 14, 2022 at 11:42 AM David Hildenbrand  wrote:
>
> Let's factor out determining the minimum alignment requirement for CMA
> and add a helpful comment.
>
> No functional change intended.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/include/asm/fadump-internal.h |  5 -
>  arch/powerpc/kernel/fadump.c   |  2 +-
>  drivers/of/of_reserved_mem.c   |  9 +++--

Acked-by: Rob Herring 

>  include/linux/cma.h|  9 +
>  kernel/dma/contiguous.c|  4 +---
>  mm/cma.c   | 20 +---
>  6 files changed, 19 insertions(+), 30 deletions(-)


[PATCH v2 1/2] crypto: vmx: Turn CRYPTO_DEV_VMX_ENCRYPT into tristate

2022-02-15 Thread Petr Vorel
and remove CRYPTO_DEV_VMX, which looked redundant when only
CRYPTO_DEV_VMX_ENCRYPT used it. Also it forces CRYPTO_GHASH to be
builtin even CRYPTO_DEV_VMX_ENCRYPT was configured as module.

Update powerpc defconfigs and description in MAINTAINERS.

Signed-off-by: Petr Vorel 
---
new in v2

This might be a bit aggressive, but IMHO CRYPTO_DEV_VMX only complicated
things for nothing. But if you do *not* agree with removing it, I just add
select to drivers/crypto/vmx/Kconfig (which forces dependencies to be
always modules.)

If it's ok for you to remove, please also check whether the description
is ok. get_maintainer.pl script has size limitation:

$ ./scripts/get_maintainer.pl drivers/crypto/vmx/Kconfig
...
linux-cry...@vger.kernel.org (open list:IBM Power VMX Cryptographic 
Acceleration Instru...)

maybe the name should be shorter.

Kind regards,
Petr

 MAINTAINERS| 2 +-
 arch/powerpc/configs/powernv_defconfig | 2 +-
 arch/powerpc/configs/ppc64_defconfig   | 2 +-
 arch/powerpc/configs/pseries_defconfig | 2 +-
 drivers/crypto/Kconfig | 6 --
 drivers/crypto/vmx/Kconfig | 4 ++--
 6 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea3e6c914384..80e562579180 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9207,7 +9207,7 @@ L:target-de...@vger.kernel.org
 S: Supported
 F: drivers/scsi/ibmvscsi_tgt/
 
-IBM Power VMX Cryptographic instructions
+IBM Power VMX Cryptographic Acceleration Instructions Driver
 M: Breno Leitão 
 M: Nayna Jain 
 M: Paulo Flabiano Smorigo 
diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index 49f49c263935..4b250d05dcdf 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -337,7 +337,7 @@ CONFIG_CRYPTO_TEA=m
 CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_LZO=m
 CONFIG_CRYPTO_DEV_NX=y
-CONFIG_CRYPTO_DEV_VMX=y
+CONFIG_CRYPTO_DEV_VMX_ENCRYPT=m
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index c8b0e80d613b..ebd33b94debb 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -355,7 +355,7 @@ CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_LZO=m
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
-CONFIG_CRYPTO_DEV_VMX=y
+CONFIG_CRYPTO_DEV_VMX_ENCRYPT=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
 CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index b571d084c148..304673817ef1 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -315,7 +315,7 @@ CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_LZO=m
 CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
-CONFIG_CRYPTO_DEV_VMX=y
+CONFIG_CRYPTO_DEV_VMX_ENCRYPT=m
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4f705674f94f..956f956607a5 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -761,12 +761,6 @@ config CRYPTO_DEV_QCOM_RNG
  To compile this driver as a module, choose M here. The
  module will be called qcom-rng. If unsure, say N.
 
-config CRYPTO_DEV_VMX
-   bool "Support for VMX cryptographic acceleration instructions"
-   depends on PPC64 && VSX
-   help
- Support for VMX cryptographic acceleration instructions.
-
 source "drivers/crypto/vmx/Kconfig"
 
 config CRYPTO_DEV_IMGTEC_HASH
diff --git a/drivers/crypto/vmx/Kconfig b/drivers/crypto/vmx/Kconfig
index c85fab7ef0bd..1a3808b719f3 100644
--- a/drivers/crypto/vmx/Kconfig
+++ b/drivers/crypto/vmx/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config CRYPTO_DEV_VMX_ENCRYPT
-   tristate "Encryption acceleration support on P8 CPU"
-   depends on CRYPTO_DEV_VMX
+   tristate "Power VMX cryptographic acceleration instructions driver"
+   depends on PPC64 && VSX
select CRYPTO_GHASH
default m
help
-- 
2.35.1



[PATCH v2 2/2] crypto: vmx: Add missing dependencies

2022-02-15 Thread Petr Vorel
vmx-crypto module depends on CRYPTO_AES, CRYPTO_CBC, CRYPTO_CTR or
CRYPTO_XTS, thus add them.

These dependencies are likely to be enabled, but if
CRYPTO_DEV_VMX_ENCRYPT=y && !CRYPTO_MANAGER_DISABLE_TESTS
and either of CRYPTO_AES, CRYPTO_CBC, CRYPTO_CTR or CRYPTO_XTS is built
as module or disabled, alg_test() from crypto/testmgr.c complains during
boot about failing to allocate the generic fallback implementations
(2 == ENOENT):

[0.540953] Failed to allocate xts(aes) fallback: -2
[0.541014] alg: skcipher: failed to allocate transform for p8_aes_xts: -2
[0.541120] alg: self-tests for p8_aes_xts (xts(aes)) failed (rc=-2)
[0.50] Failed to allocate ctr(aes) fallback: -2
[0.544497] alg: skcipher: failed to allocate transform for p8_aes_ctr: -2
[0.544603] alg: self-tests for p8_aes_ctr (ctr(aes)) failed (rc=-2)
[0.547992] Failed to allocate cbc(aes) fallback: -2
[0.548052] alg: skcipher: failed to allocate transform for p8_aes_cbc: -2
[0.548156] alg: self-tests for p8_aes_cbc (cbc(aes)) failed (rc=-2)
[0.550745] Failed to allocate transformation for 'aes': -2
[0.550801] alg: cipher: Failed to load transform for p8_aes: -2
[0.550892] alg: self-tests for p8_aes (aes) failed (rc=-2)

Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
Fixes: d2e3ae6f3aba ("crypto: vmx - Enabling VMX module for PPC64")

Suggested-by: Nicolai Stange 
Signed-off-by: Petr Vorel 
---
changes v1->v2:
* use "select" instead of "depends on" (Nicolai)
* drop !CRYPTO_MANAGER_DISABLE_TESTS as the dependency is always (Nicolai)

 drivers/crypto/vmx/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/crypto/vmx/Kconfig b/drivers/crypto/vmx/Kconfig
index 1a3808b719f3..ce054e64b92c 100644
--- a/drivers/crypto/vmx/Kconfig
+++ b/drivers/crypto/vmx/Kconfig
@@ -2,7 +2,11 @@
 config CRYPTO_DEV_VMX_ENCRYPT
tristate "Power VMX cryptographic acceleration instructions driver"
depends on PPC64 && VSX
+   select CRYPTO_AES
+   select CRYPTO_CBC
+   select CRYPTO_CTR
select CRYPTO_GHASH
+   select CRYPTO_XTS
default m
help
  Support for VMX cryptographic acceleration instructions on Power8 CPU.
-- 
2.35.1



[PATCH v2 0/2] vmx-crypto: Add missing dependencies

2022-02-15 Thread Petr Vorel
Hi,

[ Cc powerpc list and VMX people this time ]

changes v1->v2:
* new commit: crypto: vmx: Turn CRYPTO_DEV_VMX_ENCRYPT into tristate
* use "select" instead of "depends on" (Nicolai)
* drop !CRYPTO_MANAGER_DISABLE_TESTS as the dependency is always (Nicolai)

Petr Vorel (2):
  crypto: vmx: Turn CRYPTO_DEV_VMX_ENCRYPT into tristate
  crypto: vmx: Add missing dependencies

 MAINTAINERS| 2 +-
 arch/powerpc/configs/powernv_defconfig | 2 +-
 arch/powerpc/configs/ppc64_defconfig   | 2 +-
 arch/powerpc/configs/pseries_defconfig | 2 +-
 drivers/crypto/Kconfig | 6 --
 drivers/crypto/vmx/Kconfig | 8 ++--
 6 files changed, 10 insertions(+), 12 deletions(-)

-- 
2.35.1



Re: [PATCH v3 01/10] powerpc/pseries/vas: Use common names in VAS capability structure

2022-02-15 Thread Haren Myneni
On Mon, 2022-02-14 at 12:14 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of January 22, 2022 5:54 am:
> > target/used/avail_creds provides credits usage to user space via
> > sysfs and the same interface can be used on PowerNV in future.
> > Remove "lpar" from these names so that applicable on both PowerVM
> > and PowerNV.
> 
> But not in this series? This is just to save you having to do more
> renaming later?

Thanks for your review. 
Yes, Removing _lpar_ in struct elements to make it clear so that can
easily add in sysfs patch.  

> 
> Reviewed-by: Nicholas Piggin 
> 
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/pseries/vas.c | 10 +-
> >  arch/powerpc/platforms/pseries/vas.h |  6 +++---
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/vas.c
> > b/arch/powerpc/platforms/pseries/vas.c
> > index d243ddc58827..c0737379cc7b 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -310,8 +310,8 @@ static struct vas_window
> > *vas_allocate_window(int vas_id, u64 flags,
> >  
> > cop_feat_caps = >caps;
> >  
> > -   if (atomic_inc_return(_feat_caps->used_lpar_creds) >
> > -   atomic_read(_feat_caps->target_lpar_creds)) 
> > {
> > +   if (atomic_inc_return(_feat_caps->used_creds) >
> > +   atomic_read(_feat_caps->target_creds)) {
> > pr_err("Credits are not available to allocate
> > window\n");
> > rc = -EINVAL;
> > goto out;
> > @@ -385,7 +385,7 @@ static struct vas_window
> > *vas_allocate_window(int vas_id, u64 flags,
> > free_irq_setup(txwin);
> > h_deallocate_vas_window(txwin->vas_win.winid);
> >  out:
> > -   atomic_dec(_feat_caps->used_lpar_creds);
> > +   atomic_dec(_feat_caps->used_creds);
> > kfree(txwin);
> > return ERR_PTR(rc);
> >  }
> > @@ -445,7 +445,7 @@ static int vas_deallocate_window(struct
> > vas_window *vwin)
> > }
> >  
> > list_del(>win_list);
> > -   atomic_dec(>used_lpar_creds);
> > +   atomic_dec(>used_creds);
> > mutex_unlock(_pseries_mutex);
> >  
> > put_vas_user_win_ref(>task_ref);
> > @@ -521,7 +521,7 @@ static int __init get_vas_capabilities(u8 feat,
> > enum vas_cop_feat_type type,
> > }
> > caps->max_lpar_creds = be16_to_cpu(hv_caps->max_lpar_creds);
> > caps->max_win_creds = be16_to_cpu(hv_caps->max_win_creds);
> > -   atomic_set(>target_lpar_creds,
> > +   atomic_set(>target_creds,
> >be16_to_cpu(hv_caps->target_lpar_creds));
> > if (feat == VAS_GZIP_DEF_FEAT) {
> > caps->def_lpar_creds = be16_to_cpu(hv_caps-
> > >def_lpar_creds);
> > diff --git a/arch/powerpc/platforms/pseries/vas.h
> > b/arch/powerpc/platforms/pseries/vas.h
> > index 4ecb3fcabd10..fa7ce74f1e49 100644
> > --- a/arch/powerpc/platforms/pseries/vas.h
> > +++ b/arch/powerpc/platforms/pseries/vas.h
> > @@ -72,9 +72,9 @@ struct vas_cop_feat_caps {
> > };
> > /* Total LPAR available credits. Can be different from max LPAR
> > */
> > /* credits due to DLPAR operation */
> > -   atomic_ttarget_lpar_creds;
> > -   atomic_tused_lpar_creds; /* Used credits so far */
> > -   u16 avail_lpar_creds; /* Remaining available credits */
> > +   atomic_ttarget_creds;
> > +   atomic_tused_creds; /* Used credits so far */
> > +   u16 avail_creds;/* Remaining available credits */
> >  };
> >  
> >  /*
> > -- 
> > 2.27.0
> > 
> > 
> > 



[PATCH v1 4/4] powerpc/ftrace: Style cleanup in ftrace_mprofile.S

2022-02-15 Thread Christophe Leroy
Add some line breaks to better match the file's style, add
some space after comma and fix a couple of misplaced blanks.

Suggested-by: Naveen N. Rao 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/trace/ftrace_mprofile.S | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index eb077270ec2f..89639e64acd1 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -87,8 +87,9 @@ _GLOBAL(ftrace_regs_caller)
 #endif
 
 #ifdef CONFIG_LIVEPATCH_64
-   mr  r14,r7  /* remember old NIP */
+   mr  r14, r7 /* remember old NIP */
 #endif
+
/* Calculate ip from nip-4 into r3 for call below */
subir3, r7, MCOUNT_INSN_SIZE
 
@@ -102,7 +103,7 @@ _GLOBAL(ftrace_regs_caller)
PPC_STL r11, _CCR(r1)
 
/* Load _regs in r6 for call below */
-   addir6, r1 ,STACK_FRAME_OVERHEAD
+   addir6, r1, STACK_FRAME_OVERHEAD
 
/* ftrace_call(r3, r4, r5, r6) */
 .globl ftrace_regs_call
@@ -113,6 +114,7 @@ ftrace_regs_call:
/* Load ctr with the possibly modified NIP */
PPC_LL  r3, _NIP(r1)
mtctr   r3
+
 #ifdef CONFIG_LIVEPATCH_64
cmpdr14, r3 /* has NIP been altered? */
 #endif
@@ -196,7 +198,7 @@ _GLOBAL(ftrace_caller)
 
 #ifdef CONFIG_LIVEPATCH_64
SAVE_GPR(14, r1)
-   mr  r14,r7  /* remember old NIP */
+   mr  r14, r7 /* remember old NIP */
 #endif
/* Calculate ip from nip-4 into r3 for call below */
subir3, r7, MCOUNT_INSN_SIZE
@@ -210,7 +212,7 @@ _GLOBAL(ftrace_caller)
PPC_STL r8, _MSR(r1)
 
/* Load _regs in r6 for call below */
-   addir6, r1 ,STACK_FRAME_OVERHEAD
+   addir6, r1, STACK_FRAME_OVERHEAD
 
/* ftrace_call(r3, r4, r5, r6) */
 .globl ftrace_call
@@ -220,6 +222,7 @@ ftrace_call:
 
PPC_LL  r3, _NIP(r1)
mtctr   r3
+
 #ifdef CONFIG_LIVEPATCH_64
cmpdr14, r3 /* has NIP been altered? */
REST_GPR(14, r1)
@@ -244,6 +247,7 @@ ftrace_call:
 /* Based on the cmpd above, if the NIP was altered handle livepatch */
bne-livepatch_handler
 #endif
+
bctr/* jump after _mcount site */
 
 #ifdef CONFIG_LIVEPATCH_64
-- 
2.34.1



[PATCH v1 3/4] powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set

2022-02-15 Thread Christophe Leroy
When FL_SAVE_REGS is not set we get here via ftrace_caller()
which doesn't save all registers.

ftrace_caller() explicitely clears regs.msr, so we can rely
on it to know where we come from. We don't expect MSR register
to be 0 at all when involving ftrace.

Reported-by: Naveen N. Rao 
Fixes: 40b035efe288 ("powerpc/ftrace: Implement 
CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ftrace.h   | 3 ++-
 arch/powerpc/kernel/trace/ftrace_mprofile.S | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 70b457097098..ff034ae4e472 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -30,7 +30,8 @@ struct ftrace_regs {
 
 static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs 
*fregs)
 {
-   return >regs;
+   /* We clear regs.msr in ftrace_call */
+   return fregs->regs.msr ? >regs : NULL;
 }
 
 static __always_inline void ftrace_instruction_pointer_set(struct ftrace_regs 
*fregs,
diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index 8443902d5a05..eb077270ec2f 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -205,6 +205,10 @@ _GLOBAL(ftrace_caller)
PPC_STL r0, _LINK(r1)
mr  r4, r0
 
+   /* Clear MSR to flag as ftrace_caller versus frace_regs_caller */
+   li  r8, 0
+   PPC_STL r8, _MSR(r1)
+
/* Load _regs in r6 for call below */
addir6, r1 ,STACK_FRAME_OVERHEAD
 
-- 
2.34.1



[PATCH v1 2/4] powerpc/ftrace: Add recursion protection in prepare_ftrace_return()

2022-02-15 Thread Christophe Leroy
The function_graph_enter() does not provide any recursion protection.

Add a protection in prepare_ftrace_return() in case
function_graph_enter() calls something that gets
function graph traced.

Reported-by: Naveen N. Rao 
Fixes: 830213786c49 ("powerpc/ftrace: directly call of function graph tracer by 
ftrace caller")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/trace/ftrace.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 74a176e394ef..f21b8fbd418e 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -944,6 +944,7 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
unsigned long sp)
 {
unsigned long return_hooker;
+   int bit;
 
if (unlikely(ftrace_graph_is_dead()))
goto out;
@@ -951,10 +952,16 @@ unsigned long prepare_ftrace_return(unsigned long parent, 
unsigned long ip,
if (unlikely(atomic_read(>tracing_graph_pause)))
goto out;
 
+   bit = ftrace_test_recursion_trylock(ip, parent);
+   if (bit < 0)
+   goto out;
+
return_hooker = ppc_function_entry(return_to_handler);
 
if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
parent = return_hooker;
+
+   ftrace_test_recursion_unlock(bit);
 out:
return parent;
 }
-- 
2.34.1



[PATCH v1 1/4] powerpc/ftrace: Also save r1 in ftrace_caller()

2022-02-15 Thread Christophe Leroy
Also save r1 in ftrace_caller()

r1 is needed during unwinding when the function_graph tracer
is active.

Reported-by: Naveen N. Rao 
Fixes: 830213786c49 ("powerpc/ftrace: directly call of function graph tracer by 
ftrace caller")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/trace/ftrace_mprofile.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S 
b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index 56da60e98327..8443902d5a05 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -173,6 +173,10 @@ _GLOBAL(ftrace_caller)
beq ftrace_no_trace
 #endif
 
+   /* Save previous stack pointer (r1) */
+   addir8, r1, SWITCH_FRAME_SIZE
+   PPC_STL r8, GPR1(r1)
+
/* Get the _mcount() call site out of LR */
mflrr7
PPC_STL r7, _NIP(r1)
-- 
2.34.1



Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Naveen N. Rao

Steven Rostedt wrote:

On Tue, 15 Feb 2022 19:06:48 +0530
"Naveen N. Rao"  wrote:

As I understand it, the reason ftrace_get_regs() was introduced was to 
be able to only return the pt_regs, if _all_ registers were saved into 
it, which we don't do when coming in through ftrace_caller(). See the 
x86 implementation (commit 02a474ca266a47 ("ftrace/x86: Allow for 
arguments to be passed in to ftrace_regs by default"), which returns 
pt_regs conditionally.


I can give you the history of ftrace_caller and ftrace_regs_caller.

ftrace_caller saved just enough as was denoted for gcc mcount trampolines.
The new fentry which happens at the start of the function, whereas mcount
happens after the stack frame is set up, may change the rules on some
architectures.

As for ftrace_regs_caller, that was created for kprobes. As the majority of
kprobes were added at the start of the function, it made sense to hook into
ftrace as the ftrace trampoline call is much faster than taking a
breakpoint interrupt. But to keep compatibility with breakpoint
interrupts, we needed to fill in all the registers, and make it act just
like a breakpoint interrupt.

I've been wanting to record function parameters, and because the ftrace
trampoline must at a minimum save the function parameters before calling
the ftrace callbacks, all the information for those parameters were being
saved but were never exposed to the ftrace callbacks. I created the the
DYNAMIC_FTRACE_WITH_ARGS to expose them. I first just used pt_regs with
just the parameters filled in, but that was criticized as it could be
confusing where the non filled in pt_regs might be used and thinking they
are legitimate. So I created ftrace_regs that would give you just the
function arguments (if DYNAMIC_FTRACE_WITH_ARGS is defined), or it will
give you a full pt_regs, if the caller came from the ftrace_regs_caller. If
not, it will give you a NULL pointer.

The first user to use the args was live kernel patching, as they only need
that and the return pointer.


Thanks, that helps.

- Naveen



Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Naveen N. Rao

Christophe Leroy wrote:

+ S390 people

Le 15/02/2022 à 15:28, Christophe Leroy a écrit :



Le 15/02/2022 à 14:36, Naveen N. Rao a écrit :

Michael Ellerman wrote:

Christophe Leroy  writes:

Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :

Christophe Leroy wrote:

Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h    | 17 +
 arch/powerpc/include/asm/livepatch.h |  4 +---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cdac2115eb00..e2b1792b2aae 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -210,6 +210,7 @@ config PPC
 select HAVE_DEBUG_KMEMLEAK
 select HAVE_DEBUG_STACKOVERFLOW
 select HAVE_DYNAMIC_FTRACE
+    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || 
PPC32
 select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || 
PPC32

 select HAVE_EBPF_JIT
 select HAVE_EFFICIENT_UNALIGNED_ACCESS    if 
!(CPU_LITTLE_ENDIAN && POWER7_CPU)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h

index b3f6184f77ea..45c3d6f11daa 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -22,6 +22,23 @@ static inline unsigned long 
ftrace_call_adjust(unsigned long addr)

 struct dyn_arch_ftrace {
 struct module *mod;
 };
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
+struct ftrace_regs {
+    struct pt_regs regs;
+};
+
+static __always_inline struct pt_regs 
*arch_ftrace_get_regs(struct ftrace_regs *fregs)

+{
+    return >regs;
+}


I think this is wrong. We need to differentiate between 
ftrace_caller() and ftrace_regs_caller() here, and only return 
pt_regs if coming in through ftrace_regs_caller() (i.e., 
FL_SAVE_REGS is set).


Not sure I follow you.

This is based on 5740a7c71ab6 ("s390/ftrace: add 
HAVE_DYNAMIC_FTRACE_WITH_ARGS support")


It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs 
also with ftrace_caller().


Sure you only have the params, but that's the same on s390, so what 
did I miss ?


Steven has explained the rationale for this in his other response:
https://lore.kernel.org/all/20220215093849.556d5...@gandalf.local.home/



It looks like s390 is special since it apparently saves all registers 
even for ftrace_caller: 
https://lore.kernel.org/all/YbipdU5X4HNDWIni@osiris/


It is not what I understand from their code, see 
https://elixir.bootlin.com/linux/v5.17-rc3/source/arch/s390/kernel/mcount.S#L37 



They have a common macro called with argument 'allregs' which is set to 
0 for ftrace_caller() and 1 for ftrace_regs_caller().

When allregs == 1, the macro seems to save more.

But ok, I can do like x86, but I need a trick to know whether 
FL_SAVE_REGS is set or not, like they do with fregs->regs.cs

Any idea what the condition can be for powerpc ?


We'll need to explicitly zero-out something in pt_regs in 
ftrace_caller(). We can probably use regs->msr since we don't expect it 
to be zero when saved from ftrace_regs_caller().






Finally, it looks like this change is done  via commit 894979689d3a 
("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller 
implementations") four hours the same day after the implementation of 
arch_ftrace_get_regs()


They may have forgotten to change arch_ftrace_get_regs() which was added 
in commit 5740a7c71ab6 ("s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_ARGS 
support") with the assumption that ftrace_caller and ftrace_regs_caller 
where identical.


Indeed, good find!


Thanks,
Naveen



Re: [PATCH v4 00/13] Fix LKDTM for PPC64/IA64/PARISC v4

2022-02-15 Thread Kees Cook
On Tue, Feb 15, 2022 at 01:40:55PM +0100, Christophe Leroy wrote:
> PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work
> on those three architectures because LKDTM messes up function
> descriptors with functions.
> 
> This series does some cleanup in the three architectures and
> refactors function descriptors so that it can then easily use it
> in a generic way in LKDTM.

Thanks for doing this! It looks good to me. :)

-Kees

> 
> Changes in v4:
> - Added patch 1 which Fixes 'sparse' for powerpc64le after wrong report on 
> previous series, refer 
> https://github.com/ruscur/linux-ci/actions/runs/1351427671
> - Exported dereference_function_descriptor() to modules
> - Addressed other received comments
> - Rebased on latest powerpc/next (5a72345e6a78120368fcc841b570331b6c5a50da)
> 
> Changes in v3:
> - Addressed received comments
> - Swapped some of the powerpc patches to keep func_descr_t renamed as struct 
> func_desc and remove 'struct ppc64_opd_entry'
> - Changed HAVE_FUNCTION_DESCRIPTORS macro to a config item 
> CONFIG_HAVE_FUNCTION_DESCRIPTORS
> - Dropped patch 11 ("Fix lkdtm_EXEC_RODATA()")
> 
> Changes in v2:
> - Addressed received comments
> - Moved dereference_[kernel]_function_descriptor() out of line
> - Added patches to remove func_descr_t and func_desc_t in powerpc
> - Using func_desc_t instead of funct_descr_t
> - Renamed HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR to HAVE_FUNCTION_DESCRIPTORS
> - Added a new lkdtm test to check protection of function descriptors
> 
> Christophe Leroy (13):
>   powerpc: Fix 'sparse' checking on PPC64le
>   powerpc: Move and rename func_descr_t
>   powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry'
>   powerpc: Remove 'struct ppc64_opd_entry'
>   powerpc: Prepare func_desc_t for refactorisation
>   ia64: Rename 'ip' to 'addr' in 'struct fdesc'
>   asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS
>   asm-generic: Define 'func_desc_t' to commonly describe function
> descriptors
>   asm-generic: Refactor dereference_[kernel]_function_descriptor()
>   lkdtm: Force do_nothing() out of line
>   lkdtm: Really write into kernel text in WRITE_KERN
>   lkdtm: Fix execute_[user]_location()
>   lkdtm: Add a test for function descriptors protection
> 
>  arch/Kconfig |  3 +
>  arch/ia64/Kconfig|  1 +
>  arch/ia64/include/asm/elf.h  |  2 +-
>  arch/ia64/include/asm/sections.h | 24 +---
>  arch/ia64/kernel/module.c|  6 +-
>  arch/parisc/Kconfig  |  1 +
>  arch/parisc/include/asm/sections.h   | 16 ++
>  arch/parisc/kernel/process.c | 21 ---
>  arch/powerpc/Kconfig |  1 +
>  arch/powerpc/Makefile|  2 +-
>  arch/powerpc/include/asm/code-patching.h |  2 +-
>  arch/powerpc/include/asm/elf.h   |  6 ++
>  arch/powerpc/include/asm/sections.h  | 29 ++
>  arch/powerpc/include/asm/types.h |  6 --
>  arch/powerpc/include/uapi/asm/elf.h  |  8 ---
>  arch/powerpc/kernel/module_64.c  | 42 ++
>  arch/powerpc/kernel/ptrace/ptrace.c  |  6 ++
>  arch/powerpc/kernel/signal_64.c  |  8 +--
>  drivers/misc/lkdtm/core.c|  1 +
>  drivers/misc/lkdtm/lkdtm.h   |  1 +
>  drivers/misc/lkdtm/perms.c   | 71 +++-
>  include/asm-generic/sections.h   | 15 -
>  include/linux/kallsyms.h |  2 +-
>  kernel/extable.c | 24 +++-
>  tools/testing/selftests/lkdtm/tests.txt  |  1 +
>  25 files changed, 155 insertions(+), 144 deletions(-)
> 
> -- 
> 2.34.1
> 

-- 
Kees Cook


Re: [PATCH v4 01/13] powerpc: Fix 'sparse' checking on PPC64le

2022-02-15 Thread Kees Cook
On Tue, Feb 15, 2022 at 01:40:56PM +0100, Christophe Leroy wrote:
> 'sparse' is architecture agnostic and knows nothing about ELF ABI
> version.
> 
> Just like it gets arch and powerpc type and endian from Makefile,
> it also need to get _CALL_ELF from there, otherwise it won't set
> PPC64_ELF_ABI_v2 macro for PPC64le and won't check the correct code.
> 
> Signed-off-by: Christophe Leroy 

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Christophe Leroy

+ S390 people

Le 15/02/2022 à 15:28, Christophe Leroy a écrit :



Le 15/02/2022 à 14:36, Naveen N. Rao a écrit :

Michael Ellerman wrote:

Christophe Leroy  writes:

Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :

Christophe Leroy wrote:

Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h    | 17 +
 arch/powerpc/include/asm/livepatch.h |  4 +---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cdac2115eb00..e2b1792b2aae 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -210,6 +210,7 @@ config PPC
 select HAVE_DEBUG_KMEMLEAK
 select HAVE_DEBUG_STACKOVERFLOW
 select HAVE_DYNAMIC_FTRACE
+    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || 
PPC32
 select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || 
PPC32

 select HAVE_EBPF_JIT
 select HAVE_EFFICIENT_UNALIGNED_ACCESS    if 
!(CPU_LITTLE_ENDIAN && POWER7_CPU)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h

index b3f6184f77ea..45c3d6f11daa 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -22,6 +22,23 @@ static inline unsigned long 
ftrace_call_adjust(unsigned long addr)

 struct dyn_arch_ftrace {
 struct module *mod;
 };
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
+struct ftrace_regs {
+    struct pt_regs regs;
+};
+
+static __always_inline struct pt_regs 
*arch_ftrace_get_regs(struct ftrace_regs *fregs)

+{
+    return >regs;
+}


I think this is wrong. We need to differentiate between 
ftrace_caller() and ftrace_regs_caller() here, and only return 
pt_regs if coming in through ftrace_regs_caller() (i.e., 
FL_SAVE_REGS is set).


Not sure I follow you.

This is based on 5740a7c71ab6 ("s390/ftrace: add 
HAVE_DYNAMIC_FTRACE_WITH_ARGS support")


It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs 
also with ftrace_caller().


Sure you only have the params, but that's the same on s390, so what 
did I miss ?


It looks like s390 is special since it apparently saves all registers 
even for ftrace_caller: 
https://lore.kernel.org/all/YbipdU5X4HNDWIni@osiris/


It is not what I understand from their code, see 
https://elixir.bootlin.com/linux/v5.17-rc3/source/arch/s390/kernel/mcount.S#L37 



They have a common macro called with argument 'allregs' which is set to 
0 for ftrace_caller() and 1 for ftrace_regs_caller().

When allregs == 1, the macro seems to save more.

But ok, I can do like x86, but I need a trick to know whether 
FL_SAVE_REGS is set or not, like they do with fregs->regs.cs

Any idea what the condition can be for powerpc ?



Finally, it looks like this change is done  via commit 894979689d3a 
("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller 
implementations") four hours the same day after the implementation of 
arch_ftrace_get_regs()


They may have forgotten to change arch_ftrace_get_regs() which was added 
in commit 5740a7c71ab6 ("s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_ARGS 
support") with the assumption that ftrace_caller and ftrace_regs_caller 
where identical.


Christophe


Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Steven Rostedt
On Tue, 15 Feb 2022 19:06:48 +0530
"Naveen N. Rao"  wrote:

> As I understand it, the reason ftrace_get_regs() was introduced was to 
> be able to only return the pt_regs, if _all_ registers were saved into 
> it, which we don't do when coming in through ftrace_caller(). See the 
> x86 implementation (commit 02a474ca266a47 ("ftrace/x86: Allow for 
> arguments to be passed in to ftrace_regs by default"), which returns 
> pt_regs conditionally.

I can give you the history of ftrace_caller and ftrace_regs_caller.

ftrace_caller saved just enough as was denoted for gcc mcount trampolines.
The new fentry which happens at the start of the function, whereas mcount
happens after the stack frame is set up, may change the rules on some
architectures.

As for ftrace_regs_caller, that was created for kprobes. As the majority of
kprobes were added at the start of the function, it made sense to hook into
ftrace as the ftrace trampoline call is much faster than taking a
breakpoint interrupt. But to keep compatibility with breakpoint
interrupts, we needed to fill in all the registers, and make it act just
like a breakpoint interrupt.

I've been wanting to record function parameters, and because the ftrace
trampoline must at a minimum save the function parameters before calling
the ftrace callbacks, all the information for those parameters were being
saved but were never exposed to the ftrace callbacks. I created the the
DYNAMIC_FTRACE_WITH_ARGS to expose them. I first just used pt_regs with
just the parameters filled in, but that was criticized as it could be
confusing where the non filled in pt_regs might be used and thinking they
are legitimate. So I created ftrace_regs that would give you just the
function arguments (if DYNAMIC_FTRACE_WITH_ARGS is defined), or it will
give you a full pt_regs, if the caller came from the ftrace_regs_caller. If
not, it will give you a NULL pointer.

The first user to use the args was live kernel patching, as they only need
that and the return pointer.

-- Steve


Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Christophe Leroy


Le 15/02/2022 à 14:36, Naveen N. Rao a écrit :
> Michael Ellerman wrote:
>> Christophe Leroy  writes:
>>> Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :
 Christophe Leroy wrote:
> Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
> of livepatching.
>
> Also note that powerpc being the last one to convert to
> CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
> klp_arch_set_pc() on all architectures.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/Kconfig |  1 +
>  arch/powerpc/include/asm/ftrace.h    | 17 +
>  arch/powerpc/include/asm/livepatch.h |  4 +---
>  3 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index cdac2115eb00..e2b1792b2aae 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -210,6 +210,7 @@ config PPC
>  select HAVE_DEBUG_KMEMLEAK
>  select HAVE_DEBUG_STACKOVERFLOW
>  select HAVE_DYNAMIC_FTRACE
> +    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || 
> PPC32
>  select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || 
> PPC32
>  select HAVE_EBPF_JIT
>  select HAVE_EFFICIENT_UNALIGNED_ACCESS    if 
> !(CPU_LITTLE_ENDIAN && POWER7_CPU)
> diff --git a/arch/powerpc/include/asm/ftrace.h 
> b/arch/powerpc/include/asm/ftrace.h
> index b3f6184f77ea..45c3d6f11daa 100644
> --- a/arch/powerpc/include/asm/ftrace.h
> +++ b/arch/powerpc/include/asm/ftrace.h
> @@ -22,6 +22,23 @@ static inline unsigned long 
> ftrace_call_adjust(unsigned long addr)
>  struct dyn_arch_ftrace {
>  struct module *mod;
>  };
> +
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> +struct ftrace_regs {
> +    struct pt_regs regs;
> +};
> +
> +static __always_inline struct pt_regs *arch_ftrace_get_regs(struct 
> ftrace_regs *fregs)
> +{
> +    return >regs;
> +}

 I think this is wrong. We need to differentiate between 
 ftrace_caller() and ftrace_regs_caller() here, and only return 
 pt_regs if coming in through ftrace_regs_caller() (i.e., 
 FL_SAVE_REGS is set).
>>>
>>> Not sure I follow you.
>>>
>>> This is based on 5740a7c71ab6 ("s390/ftrace: add 
>>> HAVE_DYNAMIC_FTRACE_WITH_ARGS support")
>>>
>>> It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs 
>>> also with ftrace_caller().
>>>
>>> Sure you only have the params, but that's the same on s390, so what 
>>> did I miss ?
> 
> It looks like s390 is special since it apparently saves all registers 
> even for ftrace_caller: 
> https://lore.kernel.org/all/YbipdU5X4HNDWIni@osiris/

It is not what I understand from their code, see 
https://elixir.bootlin.com/linux/v5.17-rc3/source/arch/s390/kernel/mcount.S#L37

They have a common macro called with argument 'allregs' which is set to 
0 for ftrace_caller() and 1 for ftrace_regs_caller().
When allregs == 1, the macro seems to save more.

But ok, I can do like x86, but I need a trick to know whether 
FL_SAVE_REGS is set or not, like they do with fregs->regs.cs
Any idea what the condition can be for powerpc ?

Thanks
Christophe

Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Naveen N. Rao

Michael Ellerman wrote:

Christophe Leroy  writes:

Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :

Christophe Leroy wrote:

Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
of livepatching.

Also note that powerpc being the last one to convert to
CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
klp_arch_set_pc() on all architectures.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h    | 17 +
 arch/powerpc/include/asm/livepatch.h |  4 +---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cdac2115eb00..e2b1792b2aae 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -210,6 +210,7 @@ config PPC
 select HAVE_DEBUG_KMEMLEAK
 select HAVE_DEBUG_STACKOVERFLOW
 select HAVE_DYNAMIC_FTRACE
+    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || PPC32
 select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || PPC32
 select HAVE_EBPF_JIT
 select HAVE_EFFICIENT_UNALIGNED_ACCESS    if !(CPU_LITTLE_ENDIAN 
&& POWER7_CPU)
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h

index b3f6184f77ea..45c3d6f11daa 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -22,6 +22,23 @@ static inline unsigned long 
ftrace_call_adjust(unsigned long addr)

 struct dyn_arch_ftrace {
 struct module *mod;
 };
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
+struct ftrace_regs {
+    struct pt_regs regs;
+};
+
+static __always_inline struct pt_regs *arch_ftrace_get_regs(struct 
ftrace_regs *fregs)

+{
+    return >regs;
+}


I think this is wrong. We need to differentiate between ftrace_caller() 
and ftrace_regs_caller() here, and only return pt_regs if coming in 
through ftrace_regs_caller() (i.e., FL_SAVE_REGS is set).


Not sure I follow you.

This is based on 5740a7c71ab6 ("s390/ftrace: add 
HAVE_DYNAMIC_FTRACE_WITH_ARGS support")


It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs also 
with ftrace_caller().


Sure you only have the params, but that's the same on s390, so what did 
I miss ?


It looks like s390 is special since it apparently saves all registers 
even for ftrace_caller: 
https://lore.kernel.org/all/YbipdU5X4HNDWIni@osiris/


As I understand it, the reason ftrace_get_regs() was introduced was to 
be able to only return the pt_regs, if _all_ registers were saved into 
it, which we don't do when coming in through ftrace_caller(). See the 
x86 implementation (commit 02a474ca266a47 ("ftrace/x86: Allow for 
arguments to be passed in to ftrace_regs by default"), which returns 
pt_regs conditionally.




I already have this series in next, I can pull it out, but I'd rather
not.


Yeah, I'm sorry about the late review on this one.



I'll leave it in for now, hopefully you two can agree overnight my time
whether this is a big problem or something we can fix with a fixup
patch.


I think changes to this particular patch can be added as an incremental 
patch. If anything, pt_regs won't have all valid registers, but no one 
should depend on it without also setting FL_SAVE_REGS anyway.


I was concerned about patch 8 though, where we are missing saving r1 
into pt_regs. That gets used in patch 11, and will be used during 
unwinding when the function_graph tracer is active. But, this should 
still just result in us being unable to unwind the stack, so I think 
that can also be an incremental patch.



Thanks,
Naveen


RE: [PATCH 09/14] m68k: drop custom __access_ok()

2022-02-15 Thread David Laight
From: Arnd Bergmann
> Sent: 15 February 2022 10:02
> 
> On Tue, Feb 15, 2022 at 8:13 AM Al Viro  wrote:
> > On Tue, Feb 15, 2022 at 07:29:42AM +0100, Christoph Hellwig wrote:
> > > On Tue, Feb 15, 2022 at 12:37:41AM +, Al Viro wrote:
> > > > Perhaps simply wrap that sucker into #ifdef 
> > > > CONFIG_CPU_HAS_ADDRESS_SPACES
> > > > (and trim the comment down to "coldfire and 68000 will pick generic
> > > > variant")?
> > >
> > > I wonder if we should invert 
> > > CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
> > > select the separate address space config for s390, sparc64, non-coldfire
> > > m68k and mips with EVA and then just have one single access_ok for
> > > overlapping address space (as added by Arnd) and non-overlapping ones
> > > (always return true).
> >
> > parisc is also such...  How about
> >
> > select ALTERNATE_SPACE_USERLAND
> >
> > for that bunch?
> 
> Either of those works for me. My current version has this keyed off
> TASK_SIZE_MAX==ULONG_MAX, but a CONFIG_ symbol does
> look more descriptive.
> 
> >  While we are at it, how many unusual access_ok() instances are
> > left after this series?  arm64, itanic, um, anything else?
> 
> x86 adds a WARN_ON_IN_IRQ() check in there.

If is a noop unless CONFIG_DEBUG_ATOMIC_SLEEP is set.
I doubt that is often enabled.

> This could be
> made generic, but it's not obvious what exactly the exceptions are
> that other architectures need. The arm64 tagged pointers could
> probably also get integrated into the generic version.
> 
> > FWIW, sparc32 has a slightly unusual instance (see uaccess_32.h there); it's
> > obviously cheaper than generic and I wonder if the trick is legitimate (and
> > applicable elsewhere, perhaps)...
> 
> Right, a few others have the same, but I wasn't convinced that this
> is actually safe for call possible cases: it's trivial to construct a caller
> that works on other architectures but not this one, if you pass a large
> enough size value and don't access the contents in sequence.

You'd need code that did an access_ok() check and then read from
a large offset from the address - unlikely.
It's not like the access_ok() check for read/write is done on syscall
entry and then everything underneath assumes it is valid.

Hasn't (almost) everything been checked for function calls between
user_access_begin() and the actual accesses?
And access_ok() is done by/at the same time as user_access_begin()?

You do need an unmapped page above the address that is tested.

> Also, like the ((addr | (addr + size)) & MASK) check on some other
> architectures, it is less portable because it makes assumptions about
> the actual layout beyond a fixed address limit.

Isn't that test broken without a separate bound check on size?

I also seem to remember that access_ok(xxx, 0) is always 'ok'
and some of the 'fast' tests give a false negative if the user
buffer ends with the last byte of user address space.

So you may need:
size < TASK_SIZE && (addr < (TASK_SIZE - size - 1) || !size)
(sprinkled with [un]likely())

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH 05/14] uaccess: add generic __{get,put}_kernel_nofault

2022-02-15 Thread Arnd Bergmann
On Tue, Feb 15, 2022 at 1:31 AM Al Viro  wrote:
>
> On Mon, Feb 14, 2022 at 05:34:43PM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> > All architectures that don't provide __{get,put}_kernel_nofault() yet
> > can implement this on top of __{get,put}_user.
> >
> > Add a generic version that lets everything use the normal
> > copy_{from,to}_kernel_nofault() code based on these, removing the last
> > use of get_fs()/set_fs() from architecture-independent code.
>
> I'd put the list of those architectures (AFAICS, that's alpha, ia64,
> microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa) into commit
> message - it's not that hard to find out, but...

done.

> And AFAICS, you've missed nios2 - see
> #define __put_user(x, ptr) put_user(x, ptr)
> in there.  nds32 oddities are dealt with earlier in the series, this
> one is not...

Ok, fixed my bug in nios2 __put_user() as well now. This one is not nearly
as bad as nds32, at least without my patches it should work as expected.

Unfortunately I also noticed that __get_user() on microblaze and nios2
is completely broken for 64-bit arguments, where these copy eight bytes
into a four byte buffer. I'll try to come up with a fix for this as well then.

 Arnd


[PATCH v4 04/13] powerpc: Remove 'struct ppc64_opd_entry'

2022-02-15 Thread Christophe Leroy
'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h

It was initially in module_64.c and commit 2d291e902791 ("Fix compile
failure with non modular builds") moved it into asm/elf.h

But it was by mistake added outside of __KERNEL__ section,
therefore commit c3617f72036c ("UAPI: (Scripted) Disintegrate
arch/powerpc/include/asm") moved it to uapi/asm/elf.h

Now that it is not used anymore by the kernel, remove it.

Reviewed-by: Kees Cook 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/uapi/asm/elf.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/elf.h 
b/arch/powerpc/include/uapi/asm/elf.h
index 860c59291bfc..308857123a08 100644
--- a/arch/powerpc/include/uapi/asm/elf.h
+++ b/arch/powerpc/include/uapi/asm/elf.h
@@ -289,12 +289,4 @@ typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 /* Keep this the last entry.  */
 #define R_PPC64_NUM253
 
-/* There's actually a third entry here, but it's unused */
-struct ppc64_opd_entry
-{
-   unsigned long funcaddr;
-   unsigned long r2;
-};
-
-
 #endif /* _UAPI_ASM_POWERPC_ELF_H */
-- 
2.34.1



[PATCH v4 05/13] powerpc: Prepare func_desc_t for refactorisation

2022-02-15 Thread Christophe Leroy
In preparation of making func_desc_t generic, change the ELFv2
version to a struct containing 'addr' element.

This allows using single helpers common to ELFv1 and ELFv2 and
reduces the amount of #ifdef's

Signed-off-by: Christophe Leroy 
Acked-by: Nicholas Piggin 
Reviewed-by: Kees Cook 
---
 arch/powerpc/kernel/module_64.c | 36 -
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 46e8eeb7c432..ff93ef4cb5a2 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -33,19 +33,17 @@
 #ifdef PPC64_ELF_ABI_v2
 
 /* An address is simply the address of the function. */
-typedef unsigned long func_desc_t;
+typedef struct {
+   unsigned long addr;
+} func_desc_t;
 
 static func_desc_t func_desc(unsigned long addr)
 {
-   return addr;
-}
-static unsigned long func_addr(unsigned long addr)
-{
-   return addr;
-}
-static unsigned long stub_func_addr(func_desc_t func)
-{
-   return func;
+   func_desc_t desc = {
+   .addr = addr,
+   };
+
+   return desc;
 }
 
 /* PowerPC64 specific values for the Elf64_Sym st_other field.  */
@@ -70,14 +68,6 @@ static func_desc_t func_desc(unsigned long addr)
 {
return *(struct func_desc *)addr;
 }
-static unsigned long func_addr(unsigned long addr)
-{
-   return func_desc(addr).addr;
-}
-static unsigned long stub_func_addr(func_desc_t func)
-{
-   return func.addr;
-}
 static unsigned int local_entry_offset(const Elf64_Sym *sym)
 {
return 0;
@@ -93,6 +83,16 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
 }
 #endif
 
+static unsigned long func_addr(unsigned long addr)
+{
+   return func_desc(addr).addr;
+}
+
+static unsigned long stub_func_addr(func_desc_t func)
+{
+   return func.addr;
+}
+
 #define STUB_MAGIC 0x73747562 /* stub */
 
 /* Like PPC32, we need little trampolines to do > 24-bit jumps (into
-- 
2.34.1



[PATCH v4 02/13] powerpc: Move and rename func_descr_t

2022-02-15 Thread Christophe Leroy
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.

powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'

Vote for 'addr' and update 'func_descr_t' accordingly.

Move it in asm/elf.h to have it at the same place on all
three architectures, remove the typedef which hides its real
type, and change it to a smoother name 'struct func_desc'.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Kees Cook 
---
 arch/powerpc/include/asm/code-patching.h | 2 +-
 arch/powerpc/include/asm/elf.h   | 6 ++
 arch/powerpc/include/asm/types.h | 6 --
 arch/powerpc/kernel/signal_64.c  | 8 
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index e26080539c31..409483b2d0ce 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -118,7 +118,7 @@ static inline unsigned long ppc_function_entry(void *func)
 * function's descriptor. The first entry in the descriptor is the
 * address of the function text.
 */
-   return ((func_descr_t *)func)->entry;
+   return ((struct func_desc *)func)->addr;
 #else
return (unsigned long)func;
 #endif
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index b8425e3cfd81..971589a21bc0 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -176,4 +176,10 @@ do {   
\
 /* Relocate the kernel image to @final_address */
 void relocate(unsigned long final_address);
 
+struct func_desc {
+   unsigned long addr;
+   unsigned long toc;
+   unsigned long env;
+};
+
 #endif /* _ASM_POWERPC_ELF_H */
diff --git a/arch/powerpc/include/asm/types.h b/arch/powerpc/include/asm/types.h
index f1630c553efe..97da77bc48c9 100644
--- a/arch/powerpc/include/asm/types.h
+++ b/arch/powerpc/include/asm/types.h
@@ -23,12 +23,6 @@
 
 typedef __vector128 vector128;
 
-typedef struct {
-   unsigned long entry;
-   unsigned long toc;
-   unsigned long env;
-} func_descr_t;
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_TYPES_H */
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d1e1fc0acbea..73d483b07ff3 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -936,11 +936,11 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t 
*set,
 * descriptor is the entry address of signal and the second
 * entry is the TOC value we need to use.
 */
-   func_descr_t __user *funct_desc_ptr =
-   (func_descr_t __user *) ksig->ka.sa.sa_handler;
+   struct func_desc __user *ptr =
+   (struct func_desc __user *)ksig->ka.sa.sa_handler;
 
-   err |= get_user(regs->ctr, _desc_ptr->entry);
-   err |= get_user(regs->gpr[2], _desc_ptr->toc);
+   err |= get_user(regs->ctr, >addr);
+   err |= get_user(regs->gpr[2], >toc);
}
 
/* enter the signal handler in native-endian mode */
-- 
2.34.1



[PATCH v4 09/13] asm-generic: Refactor dereference_[kernel]_function_descriptor()

2022-02-15 Thread Christophe Leroy
dereference_function_descriptor() and
dereference_kernel_function_descriptor() are identical on the
three architectures implementing them.

Make them common and put them out-of-line in kernel/extable.c
which is one of the users and has similar type of functions.

Reviewed-by: Kees Cook 
Reviewed-by: Arnd Bergmann 
Signed-off-by: Christophe Leroy 
Acked-by: Helge Deller 
---
 arch/ia64/include/asm/sections.h| 19 ---
 arch/parisc/include/asm/sections.h  |  9 -
 arch/parisc/kernel/process.c| 21 -
 arch/powerpc/include/asm/sections.h | 23 ---
 include/asm-generic/sections.h  |  2 ++
 kernel/extable.c| 23 ++-
 6 files changed, 24 insertions(+), 73 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 3abe0562b01a..8e0875cf6071 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -30,23 +30,4 @@ extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_b
 extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
-#undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
-{
-   struct fdesc *desc = ptr;
-   void *p;
-
-   if (!get_kernel_nofault(p, (void *)>addr))
-   ptr = p;
-   return ptr;
-}
-
-#undef dereference_kernel_function_descriptor
-static inline void *dereference_kernel_function_descriptor(void *ptr)
-{
-   if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
-   return ptr;
-   return dereference_function_descriptor(ptr);
-}
-
 #endif /* _ASM_IA64_SECTIONS_H */
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index ace1d4047a0b..33df42b5cc6d 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -12,13 +12,4 @@ typedef Elf64_Fdesc func_desc_t;
 
 extern char __alt_instructions[], __alt_instructions_end[];
 
-#ifdef CONFIG_64BIT
-
-#undef dereference_function_descriptor
-void *dereference_function_descriptor(void *);
-
-#undef dereference_kernel_function_descriptor
-void *dereference_kernel_function_descriptor(void *);
-#endif
-
 #endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index ea3d83b6fb62..2030c77592d3 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -263,27 +263,6 @@ __get_wchan(struct task_struct *p)
return 0;
 }
 
-#ifdef CONFIG_64BIT
-void *dereference_function_descriptor(void *ptr)
-{
-   Elf64_Fdesc *desc = ptr;
-   void *p;
-
-   if (!get_kernel_nofault(p, (void *)>addr))
-   ptr = p;
-   return ptr;
-}
-
-void *dereference_kernel_function_descriptor(void *ptr)
-{
-   if (ptr < (void *)__start_opd ||
-   ptr >= (void *)__end_opd)
-   return ptr;
-
-   return dereference_function_descriptor(ptr);
-}
-#endif
-
 static inline unsigned long brk_rnd(void)
 {
return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index fddfb3937868..8be2c491c733 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -58,29 +58,6 @@ static inline int overlaps_kernel_text(unsigned long start, 
unsigned long end)
(unsigned long)_stext < end;
 }
 
-#ifdef PPC64_ELF_ABI_v1
-
-#undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
-{
-   struct func_desc *desc = ptr;
-   void *p;
-
-   if (!get_kernel_nofault(p, (void *)>addr))
-   ptr = p;
-   return ptr;
-}
-
-#undef dereference_kernel_function_descriptor
-static inline void *dereference_kernel_function_descriptor(void *ptr)
-{
-   if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
-   return ptr;
-
-   return dereference_function_descriptor(ptr);
-}
-#endif /* PPC64_ELF_ABI_v1 */
-
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index bbf97502470c..d0f7bdd2fdf2 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -60,6 +60,8 @@ extern __visible const void __nosave_begin, __nosave_end;
 
 /* Function descriptor handling (if any).  Override in asm/sections.h */
 #ifdef CONFIG_HAVE_FUNCTION_DESCRIPTORS
+void *dereference_function_descriptor(void *ptr);
+void *dereference_kernel_function_descriptor(void *ptr);
 #else
 #define dereference_function_descriptor(p) ((void *)(p))
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
diff --git a/kernel/extable.c b/kernel/extable.c
index b6f330f0fe74..394c39b86e38 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -3,6 +3,7 @@
Copyright (C) 2001 Rusty Russell, 2002 Rusty Russell IBM.
 
 */

[PATCH v4 00/13] Fix LKDTM for PPC64/IA64/PARISC v4

2022-02-15 Thread Christophe Leroy
PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work
on those three architectures because LKDTM messes up function
descriptors with functions.

This series does some cleanup in the three architectures and
refactors function descriptors so that it can then easily use it
in a generic way in LKDTM.

Changes in v4:
- Added patch 1 which Fixes 'sparse' for powerpc64le after wrong report on 
previous series, refer 
https://github.com/ruscur/linux-ci/actions/runs/1351427671
- Exported dereference_function_descriptor() to modules
- Addressed other received comments
- Rebased on latest powerpc/next (5a72345e6a78120368fcc841b570331b6c5a50da)

Changes in v3:
- Addressed received comments
- Swapped some of the powerpc patches to keep func_descr_t renamed as struct 
func_desc and remove 'struct ppc64_opd_entry'
- Changed HAVE_FUNCTION_DESCRIPTORS macro to a config item 
CONFIG_HAVE_FUNCTION_DESCRIPTORS
- Dropped patch 11 ("Fix lkdtm_EXEC_RODATA()")

Changes in v2:
- Addressed received comments
- Moved dereference_[kernel]_function_descriptor() out of line
- Added patches to remove func_descr_t and func_desc_t in powerpc
- Using func_desc_t instead of funct_descr_t
- Renamed HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR to HAVE_FUNCTION_DESCRIPTORS
- Added a new lkdtm test to check protection of function descriptors

Christophe Leroy (13):
  powerpc: Fix 'sparse' checking on PPC64le
  powerpc: Move and rename func_descr_t
  powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry'
  powerpc: Remove 'struct ppc64_opd_entry'
  powerpc: Prepare func_desc_t for refactorisation
  ia64: Rename 'ip' to 'addr' in 'struct fdesc'
  asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS
  asm-generic: Define 'func_desc_t' to commonly describe function
descriptors
  asm-generic: Refactor dereference_[kernel]_function_descriptor()
  lkdtm: Force do_nothing() out of line
  lkdtm: Really write into kernel text in WRITE_KERN
  lkdtm: Fix execute_[user]_location()
  lkdtm: Add a test for function descriptors protection

 arch/Kconfig |  3 +
 arch/ia64/Kconfig|  1 +
 arch/ia64/include/asm/elf.h  |  2 +-
 arch/ia64/include/asm/sections.h | 24 +---
 arch/ia64/kernel/module.c|  6 +-
 arch/parisc/Kconfig  |  1 +
 arch/parisc/include/asm/sections.h   | 16 ++
 arch/parisc/kernel/process.c | 21 ---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Makefile|  2 +-
 arch/powerpc/include/asm/code-patching.h |  2 +-
 arch/powerpc/include/asm/elf.h   |  6 ++
 arch/powerpc/include/asm/sections.h  | 29 ++
 arch/powerpc/include/asm/types.h |  6 --
 arch/powerpc/include/uapi/asm/elf.h  |  8 ---
 arch/powerpc/kernel/module_64.c  | 42 ++
 arch/powerpc/kernel/ptrace/ptrace.c  |  6 ++
 arch/powerpc/kernel/signal_64.c  |  8 +--
 drivers/misc/lkdtm/core.c|  1 +
 drivers/misc/lkdtm/lkdtm.h   |  1 +
 drivers/misc/lkdtm/perms.c   | 71 +++-
 include/asm-generic/sections.h   | 15 -
 include/linux/kallsyms.h |  2 +-
 kernel/extable.c | 24 +++-
 tools/testing/selftests/lkdtm/tests.txt  |  1 +
 25 files changed, 155 insertions(+), 144 deletions(-)

-- 
2.34.1



[PATCH v4 07/13] asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS

2022-02-15 Thread Christophe Leroy
Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option
named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of
'dereference_function_descriptor' macro to know whether an
arch has function descriptors.

To limit churn in one of the following patches, use
an #ifdef/#else construct with empty first part
instead of an #ifndef in asm-generic/sections.h

On powerpc, make sure the config option matches the ABI used
by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2
when calling 'sparse' so that sparse sees the same piece of
code as GCC.

And include a helper to check whether an arch has function
descriptors or not : have_function_descriptors()

Reviewed-by: Kees Cook 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
Acked-by: Helge Deller 
---
 arch/Kconfig| 3 +++
 arch/ia64/Kconfig   | 1 +
 arch/ia64/include/asm/sections.h| 2 --
 arch/parisc/Kconfig | 1 +
 arch/parisc/include/asm/sections.h  | 2 --
 arch/powerpc/Kconfig| 1 +
 arch/powerpc/include/asm/sections.h | 2 --
 arch/powerpc/kernel/ptrace/ptrace.c | 6 ++
 include/asm-generic/sections.h  | 8 +++-
 include/linux/kallsyms.h| 2 +-
 10 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 678a80713b21..fe24174cb63c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -205,6 +205,9 @@ config HAVE_FUNCTION_ERROR_INJECTION
 config HAVE_NMI
bool
 
+config HAVE_FUNCTION_DESCRIPTORS
+   bool
+
 config TRACE_IRQFLAGS_SUPPORT
bool
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index a7e01573abd8..da85c3b23b16 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -35,6 +35,7 @@ config IA64
select HAVE_SETUP_PER_CPU_AREA
select TTY
select HAVE_ARCH_TRACEHOOK
+   select HAVE_FUNCTION_DESCRIPTORS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE if HUGETLB_PAGE
select VIRT_TO_BUS
diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 35f24e52149a..2460d365a057 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -27,8 +27,6 @@ extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_b
 extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
-#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
-
 #undef dereference_function_descriptor
 static inline void *dereference_function_descriptor(void *ptr)
 {
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 43c1c880def6..82e7ab1a9764 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -69,6 +69,7 @@ config PARISC
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_SOFTIRQ_ON_OWN_STACK if IRQSTACKS
select TRACE_IRQFLAGS_SUPPORT
+   select HAVE_FUNCTION_DESCRIPTORS if 64BIT
 
help
  The PA-RISC microprocessor is designed by Hewlett-Packard and used
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index bb52aea0cb21..c8092e4d94de 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -9,8 +9,6 @@ extern char __alt_instructions[], __alt_instructions_end[];
 
 #ifdef CONFIG_64BIT
 
-#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
-
 #undef dereference_function_descriptor
 void *dereference_function_descriptor(void *);
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 28e4047e99e8..23ce71367467 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -207,6 +207,7 @@ config PPC
select HAVE_EFFICIENT_UNALIGNED_ACCESS  if !(CPU_LITTLE_ENDIAN && 
POWER7_CPU)
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
+   select HAVE_FUNCTION_DESCRIPTORSif PPC64 && !CPU_LITTLE_ENDIAN
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index baca39f4c6d3..7728a7a146c3 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -56,8 +56,6 @@ static inline int overlaps_kernel_text(unsigned long start, 
unsigned long end)
 
 #ifdef PPC64_ELF_ABI_v1
 
-#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
-
 #undef dereference_function_descriptor
 static inline void *dereference_function_descriptor(void *ptr)
 {
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index c43f77e2ac31..1212a812a7ab 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -445,4 +445,10 @@ void __init pt_regs_check(void)
 * real registers.
 */
BUILD_BUG_ON(PT_DSCR < sizeof(struct user_pt_regs) / sizeof(unsigned 
long));
+
+#ifdef PPC64_ELF_ABI_v1
+   

[PATCH v4 13/13] lkdtm: Add a test for function descriptors protection

2022-02-15 Thread Christophe Leroy
Add WRITE_OPD to check that you can't modify function
descriptors.

Gives the following result when function descriptors are
not protected:

lkdtm: Performing direct entry WRITE_OPD
lkdtm: attempting bad 16 bytes write at c269b358
lkdtm: FAIL: survived bad write
lkdtm: do_nothing was hijacked!

Looks like a standard compiler barrier() is not enough to force
GCC to use the modified function descriptor. Had to add a fake empty
inline assembly to force GCC to reload the function descriptor.

Signed-off-by: Christophe Leroy 
Acked-by: Kees Cook 
---
 drivers/misc/lkdtm/core.c   |  1 +
 drivers/misc/lkdtm/lkdtm.h  |  1 +
 drivers/misc/lkdtm/perms.c  | 22 ++
 tools/testing/selftests/lkdtm/tests.txt |  1 +
 4 files changed, 25 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index f69b964b9952..e2228b6fc09b 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -149,6 +149,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+   CRASHTYPE(WRITE_OPD),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index d6137c70ebbe..305fc2ec3f25 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -106,6 +106,7 @@ void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
 void lkdtm_WRITE_KERN(void);
+void lkdtm_WRITE_OPD(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
 void lkdtm_EXEC_KMALLOC(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 1cf24c4a79e9..2c6aba3ff32b 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -44,6 +44,11 @@ static noinline void do_overwritten(void)
return;
 }
 
+static noinline void do_almost_nothing(void)
+{
+   pr_info("do_nothing was hijacked!\n");
+}
+
 static void *setup_function_descriptor(func_desc_t *fdesc, void *dst)
 {
if (!have_function_descriptors())
@@ -144,6 +149,23 @@ void lkdtm_WRITE_KERN(void)
do_overwritten();
 }
 
+void lkdtm_WRITE_OPD(void)
+{
+   size_t size = sizeof(func_desc_t);
+   void (*func)(void) = do_nothing;
+
+   if (!have_function_descriptors()) {
+   pr_info("XFAIL: Platform doesn't use function descriptors.\n");
+   return;
+   }
+   pr_info("attempting bad %zu bytes write at %px\n", size, do_nothing);
+   memcpy(do_nothing, do_almost_nothing, size);
+   pr_err("FAIL: survived bad write\n");
+
+   asm("" : "=m"(func));
+   func();
+}
+
 void lkdtm_EXEC_DATA(void)
 {
execute_location(data_area, CODE_WRITE);
diff --git a/tools/testing/selftests/lkdtm/tests.txt 
b/tools/testing/selftests/lkdtm/tests.txt
index 6b36b7f5dcf9..243c781f0780 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -44,6 +44,7 @@ ACCESS_NULL
 WRITE_RO
 WRITE_RO_AFTER_INIT
 WRITE_KERN
+WRITE_OPD
 REFCOUNT_INC_OVERFLOW
 REFCOUNT_ADD_OVERFLOW
 REFCOUNT_INC_NOT_ZERO_OVERFLOW
-- 
2.34.1



[PATCH v4 06/13] ia64: Rename 'ip' to 'addr' in 'struct fdesc'

2022-02-15 Thread Christophe Leroy
There are three architectures with function descriptors, try to
have common names for the address they contain in order to
refactor some functions into generic functions later.

powerpc has 'entry'
ia64 has 'ip'
parisc has 'addr'

Vote for 'addr' and update 'struct fdesc' accordingly.

Reviewed-by: Kees Cook 
Signed-off-by: Christophe Leroy 
---
 arch/ia64/include/asm/elf.h  | 2 +-
 arch/ia64/include/asm/sections.h | 2 +-
 arch/ia64/kernel/module.c| 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/include/asm/elf.h b/arch/ia64/include/asm/elf.h
index 6629301a2620..2ef5f9966ad1 100644
--- a/arch/ia64/include/asm/elf.h
+++ b/arch/ia64/include/asm/elf.h
@@ -226,7 +226,7 @@ struct got_entry {
  * Layout of the Function Descriptor
  */
 struct fdesc {
-   uint64_t ip;
+   uint64_t addr;
uint64_t gp;
 };
 
diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 3a033d2008b3..35f24e52149a 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -35,7 +35,7 @@ static inline void *dereference_function_descriptor(void *ptr)
struct fdesc *desc = ptr;
void *p;
 
-   if (!get_kernel_nofault(p, (void *)>ip))
+   if (!get_kernel_nofault(p, (void *)>addr))
ptr = p;
return ptr;
 }
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 360f36b0eb3f..8f62cf97f691 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -602,15 +602,15 @@ get_fdesc (struct module *mod, uint64_t value, int *okp)
return value;
 
/* Look for existing function descriptor. */
-   while (fdesc->ip) {
-   if (fdesc->ip == value)
+   while (fdesc->addr) {
+   if (fdesc->addr == value)
return (uint64_t)fdesc;
if ((uint64_t) ++fdesc >= mod->arch.opd->sh_addr + 
mod->arch.opd->sh_size)
BUG();
}
 
/* Create new one */
-   fdesc->ip = value;
+   fdesc->addr = value;
fdesc->gp = mod->arch.gp;
return (uint64_t) fdesc;
 }
-- 
2.34.1



[PATCH v4 10/13] lkdtm: Force do_nothing() out of line

2022-02-15 Thread Christophe Leroy
LKDTM tests display that the run do_nothing() at a given
address, but in reality do_nothing() is inlined into the
caller.

Force it out of line so that it really runs text at the
displayed address.

Acked-by: Kees Cook 
Signed-off-by: Christophe Leroy 
---
 drivers/misc/lkdtm/perms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f..60b3b2fe929d 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -21,7 +21,7 @@
 /* This is non-const, so it will end up in the .data section. */
 static u8 data_area[EXEC_SIZE];
 
-/* This is cost, so it will end up in the .rodata section. */
+/* This is const, so it will end up in the .rodata section. */
 static const unsigned long rodata = 0xAA55AA55;
 
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
@@ -31,7 +31,7 @@ static unsigned long ro_after_init __ro_after_init = 
0x55AA5500;
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
  */
-static void do_nothing(void)
+static noinline void do_nothing(void)
 {
return;
 }
-- 
2.34.1



[PATCH v4 08/13] asm-generic: Define 'func_desc_t' to commonly describe function descriptors

2022-02-15 Thread Christophe Leroy
We have three architectures using function descriptors, each with its
own type and name.

Add a common typedef that can be used in generic code.

Also add a stub typedef for architecture without function descriptors,
to avoid a forest of #ifdefs.

It replaces the similar 'func_desc_t' previously defined in
arch/powerpc/kernel/module_64.c

Reviewed-by: Kees Cook 
Acked-by: Arnd Bergmann 
Signed-off-by: Christophe Leroy 
Acked-by: Helge Deller 
---
 arch/ia64/include/asm/sections.h| 3 +++
 arch/parisc/include/asm/sections.h  | 5 +
 arch/powerpc/include/asm/sections.h | 4 
 arch/powerpc/kernel/module_64.c | 8 
 include/asm-generic/sections.h  | 5 +
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 2460d365a057..3abe0562b01a 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -9,6 +9,9 @@
 
 #include 
 #include 
+
+typedef struct fdesc func_desc_t;
+
 #include 
 
 extern char __phys_per_cpu_start[];
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index c8092e4d94de..ace1d4047a0b 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -2,6 +2,11 @@
 #ifndef _PARISC_SECTIONS_H
 #define _PARISC_SECTIONS_H
 
+#ifdef CONFIG_HAVE_FUNCTION_DESCRIPTORS
+#include 
+typedef Elf64_Fdesc func_desc_t;
+#endif
+
 /* nothing to see, move along */
 #include 
 
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 7728a7a146c3..fddfb3937868 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -6,6 +6,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_HAVE_FUNCTION_DESCRIPTORS
+typedef struct func_desc func_desc_t;
+#endif
+
 #include 
 
 extern char __head_end[];
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ff93ef4cb5a2..9abc0e783e36 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -32,11 +32,6 @@
 
 #ifdef PPC64_ELF_ABI_v2
 
-/* An address is simply the address of the function. */
-typedef struct {
-   unsigned long addr;
-} func_desc_t;
-
 static func_desc_t func_desc(unsigned long addr)
 {
func_desc_t desc = {
@@ -61,9 +56,6 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 }
 #else
 
-/* An address is address of the OPD entry, which contains address of fn. */
-typedef struct func_desc func_desc_t;
-
 static func_desc_t func_desc(unsigned long addr)
 {
return *(struct func_desc *)addr;
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 3ef83e1aebee..bbf97502470c 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -63,6 +63,11 @@ extern __visible const void __nosave_begin, __nosave_end;
 #else
 #define dereference_function_descriptor(p) ((void *)(p))
 #define dereference_kernel_function_descriptor(p) ((void *)(p))
+
+/* An address is simply the address of the function. */
+typedef struct {
+   unsigned long addr;
+} func_desc_t;
 #endif
 
 static inline bool have_function_descriptors(void)
-- 
2.34.1



[PATCH v4 12/13] lkdtm: Fix execute_[user]_location()

2022-02-15 Thread Christophe Leroy
execute_location() and execute_user_location() intent
to copy do_nothing() text and execute it at a new location.
However, at the time being it doesn't copy do_nothing() function
but do_nothing() function descriptor which still points to the
original text. So at the end it still executes do_nothing() at
its original location allthough using a copied function descriptor.

So, fix that by really copying do_nothing() text and build a new
function descriptor by copying do_nothing() function descriptor and
updating the target address with the new location.

Also fix the displayed addresses by dereferencing do_nothing()
function descriptor.

Signed-off-by: Christophe Leroy 
Acked-by: Kees Cook 
---
 drivers/misc/lkdtm/perms.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 035fcca441f0..1cf24c4a79e9 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -44,19 +44,34 @@ static noinline void do_overwritten(void)
return;
 }
 
+static void *setup_function_descriptor(func_desc_t *fdesc, void *dst)
+{
+   if (!have_function_descriptors())
+   return dst;
+
+   memcpy(fdesc, do_nothing, sizeof(*fdesc));
+   fdesc->addr = (unsigned long)dst;
+   barrier();
+
+   return fdesc;
+}
+
 static noinline void execute_location(void *dst, bool write)
 {
-   void (*func)(void) = dst;
+   void (*func)(void);
+   func_desc_t fdesc;
+   void *do_nothing_text = dereference_function_descriptor(do_nothing);
 
-   pr_info("attempting ok execution at %px\n", do_nothing);
+   pr_info("attempting ok execution at %px\n", do_nothing_text);
do_nothing();
 
if (write == CODE_WRITE) {
-   memcpy(dst, do_nothing, EXEC_SIZE);
+   memcpy(dst, do_nothing_text, EXEC_SIZE);
flush_icache_range((unsigned long)dst,
   (unsigned long)dst + EXEC_SIZE);
}
-   pr_info("attempting bad execution at %px\n", func);
+   pr_info("attempting bad execution at %px\n", dst);
+   func = setup_function_descriptor(, dst);
func();
pr_err("FAIL: func returned\n");
 }
@@ -66,16 +81,19 @@ static void execute_user_location(void *dst)
int copied;
 
/* Intentionally crossing kernel/user memory boundary. */
-   void (*func)(void) = dst;
+   void (*func)(void);
+   func_desc_t fdesc;
+   void *do_nothing_text = dereference_function_descriptor(do_nothing);
 
-   pr_info("attempting ok execution at %px\n", do_nothing);
+   pr_info("attempting ok execution at %px\n", do_nothing_text);
do_nothing();
 
-   copied = access_process_vm(current, (unsigned long)dst, do_nothing,
+   copied = access_process_vm(current, (unsigned long)dst, do_nothing_text,
   EXEC_SIZE, FOLL_WRITE);
if (copied < EXEC_SIZE)
return;
-   pr_info("attempting bad execution at %px\n", func);
+   pr_info("attempting bad execution at %px\n", dst);
+   func = setup_function_descriptor(, dst);
func();
pr_err("FAIL: func returned\n");
 }
@@ -153,7 +171,8 @@ void lkdtm_EXEC_VMALLOC(void)
 
 void lkdtm_EXEC_RODATA(void)
 {
-   execute_location(lkdtm_rodata_do_nothing, CODE_AS_IS);
+   
execute_location(dereference_function_descriptor(lkdtm_rodata_do_nothing),
+CODE_AS_IS);
 }
 
 void lkdtm_EXEC_USERSPACE(void)
-- 
2.34.1



[PATCH v4 01/13] powerpc: Fix 'sparse' checking on PPC64le

2022-02-15 Thread Christophe Leroy
'sparse' is architecture agnostic and knows nothing about ELF ABI
version.

Just like it gets arch and powerpc type and endian from Makefile,
it also need to get _CALL_ELF from there, otherwise it won't set
PPC64_ELF_ABI_v2 macro for PPC64le and won't check the correct code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ddc5a706760a..4d4d8175f4a1 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -213,7 +213,7 @@ CHECKFLAGS  += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__
 ifdef CONFIG_CPU_BIG_ENDIAN
 CHECKFLAGS += -D__BIG_ENDIAN__
 else
-CHECKFLAGS += -D__LITTLE_ENDIAN__
+CHECKFLAGS += -D__LITTLE_ENDIAN__ -D_CALL_ELF=2
 endif
 
 ifdef CONFIG_476FPE_ERR46
-- 
2.34.1



[PATCH v4 03/13] powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry'

2022-02-15 Thread Christophe Leroy
'struct ppc64_opd_entry' is somehow redundant with 'struct func_desc',
the later is more correct/complete as it includes the third
field which is unused.

So use 'struct func_desc' instead of 'struct ppc64_opd_entry'

Reviewed-by: Kees Cook 
Reviewed-by: Daniel Axtens 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/sections.h |  4 ++--
 arch/powerpc/kernel/module_64.c | 10 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 38f79e42bf3c..baca39f4c6d3 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -61,10 +61,10 @@ static inline int overlaps_kernel_text(unsigned long start, 
unsigned long end)
 #undef dereference_function_descriptor
 static inline void *dereference_function_descriptor(void *ptr)
 {
-   struct ppc64_opd_entry *desc = ptr;
+   struct func_desc *desc = ptr;
void *p;
 
-   if (!get_kernel_nofault(p, (void *)>funcaddr))
+   if (!get_kernel_nofault(p, (void *)>addr))
ptr = p;
return ptr;
 }
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 6a45e6ddbe58..46e8eeb7c432 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -64,19 +64,19 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 #else
 
 /* An address is address of the OPD entry, which contains address of fn. */
-typedef struct ppc64_opd_entry func_desc_t;
+typedef struct func_desc func_desc_t;
 
 static func_desc_t func_desc(unsigned long addr)
 {
-   return *(struct ppc64_opd_entry *)addr;
+   return *(struct func_desc *)addr;
 }
 static unsigned long func_addr(unsigned long addr)
 {
-   return func_desc(addr).funcaddr;
+   return func_desc(addr).addr;
 }
 static unsigned long stub_func_addr(func_desc_t func)
 {
-   return func.funcaddr;
+   return func.addr;
 }
 static unsigned int local_entry_offset(const Elf64_Sym *sym)
 {
@@ -187,7 +187,7 @@ static int relacmp(const void *_x, const void *_y)
 static unsigned long get_stubs_size(const Elf64_Ehdr *hdr,
const Elf64_Shdr *sechdrs)
 {
-   /* One extra reloc so it's always 0-funcaddr terminated */
+   /* One extra reloc so it's always 0-addr terminated */
unsigned long relocs = 1;
unsigned i;
 
-- 
2.34.1



[PATCH v4 11/13] lkdtm: Really write into kernel text in WRITE_KERN

2022-02-15 Thread Christophe Leroy
WRITE_KERN is supposed to overwrite some kernel text, namely
do_overwritten() function.

But at the time being it overwrites do_overwritten() function
descriptor, not function text.

Fix it by dereferencing the function descriptor to obtain
function text pointer. Export dereference_function_descriptor()
for when LKDTM is built as a module.

And make do_overwritten() noinline so that it is really
do_overwritten() which is called by lkdtm_WRITE_KERN().

Acked-by: Kees Cook 
Signed-off-by: Christophe Leroy 
---
 drivers/misc/lkdtm/perms.c | 8 +---
 kernel/extable.c   | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 60b3b2fe929d..035fcca441f0 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
 #define CODE_WRITE true
@@ -37,7 +38,7 @@ static noinline void do_nothing(void)
 }
 
 /* Must immediately follow do_nothing for size calculuations to work out. */
-static void do_overwritten(void)
+static noinline void do_overwritten(void)
 {
pr_info("do_overwritten wasn't overwritten!\n");
return;
@@ -113,8 +114,9 @@ void lkdtm_WRITE_KERN(void)
size_t size;
volatile unsigned char *ptr;
 
-   size = (unsigned long)do_overwritten - (unsigned long)do_nothing;
-   ptr = (unsigned char *)do_overwritten;
+   size = (unsigned long)dereference_function_descriptor(do_overwritten) -
+  (unsigned long)dereference_function_descriptor(do_nothing);
+   ptr = dereference_function_descriptor(do_overwritten);
 
pr_info("attempting bad %zu byte write at %px\n", size, ptr);
memcpy((void *)ptr, (unsigned char *)do_nothing, size);
diff --git a/kernel/extable.c b/kernel/extable.c
index 394c39b86e38..bda5e9761541 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -149,6 +149,7 @@ void *dereference_function_descriptor(void *ptr)
ptr = p;
return ptr;
 }
+EXPORT_SYMBOL_GPL(dereference_function_descriptor);
 
 void *dereference_kernel_function_descriptor(void *ptr)
 {
-- 
2.34.1



[PATCH] powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit

2022-02-15 Thread Michael Ellerman
When CONFIG_GENERIC_CPU=y (true for all our defconfigs) we pass
-mcpu=powerpc64 to the compiler, even when we're building a 32-bit
kernel.

This happens because we have an ifdef CONFIG_PPC_BOOK3S_64/else block in
the Makefile that was written before 32-bit supported GENERIC_CPU. Prior
to that the else block only applied to 64-bit Book3E.

The GCC man page says -mcpu=powerpc64 "[specifies] a pure ... 64-bit big
endian PowerPC ... architecture machine [type], with an appropriate,
generic processor model assumed for scheduling purposes."

It's unclear how that interacts with -m32, which we are also passing,
although obviously -m32 is taking precedence in some sense, as the
32-bit kernel only contains 32-bit instructions.

This was noticed by inspection, not via any bug reports, but it does
affect code generation. Comparing before/after code generation, there
are some changes to instruction scheduling, and the after case (with
-mcpu=powerpc64 removed) the compiler seems more keen to use r8.

Fix it by making the else case only apply to Book3E 64, which excludes
32-bit.

Fixes: 0e00a8c9fd92 ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ddc5a706760a..1e1ef4352f62 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -171,7 +171,7 @@ else
 CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=power7,$(call 
cc-option,-mtune=power5))
 CFLAGS-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mcpu=power5,-mcpu=power4)
 endif
-else
+else ifdef CONFIG_PPC_BOOK3E_64
 CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64
 endif
 
-- 
2.34.1



Re: [PATCH] powerpc/module_64: use module_init_section instead of patching names

2022-02-15 Thread Michael Ellerman
Wedson Almeida Filho  writes:
> Hi Michael,
>
> On Wed, 2 Feb 2022 at 05:53, Wedson Almeida Filho  wrote:
>>
>> Without this patch, module init sections are disabled by patching their
>> names in arch-specific code when they're loaded (which prevents code in
>> layout_sections from finding init sections). This patch uses the new
>> arch-specific module_init_section instead.
>>
>> This allows modules that have .init_array sections to have the
>> initialisers properly called (on load, before init). Without this patch,
>> the initialisers are not called because .init_array is renamed to
>> _init_array, and thus isn't found by code in find_module_sections().
>>
>> Signed-off-by: Wedson Almeida Filho 
>> ---
>>  arch/powerpc/kernel/module_64.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
...
>
> Would any additional clarification from my part be helpful here?

Just more patience ;)

> I got an email saying it was under review (and checks passed) but
> nothing appears to have happened since.

I actually put it in next late last week, but the emails got delayed due
to various gremlins.

cheers


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Mark Rutland
On Tue, Feb 15, 2022 at 09:30:41AM +, David Laight wrote:
> From: Ard Biesheuvel
> > Sent: 15 February 2022 08:18
> > 
> > On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
> > >
> > > From: Arnd Bergmann 
> > >
> > > arm64 has an inline asm implementation of access_ok() that is derived from
> > > the 32-bit arm version and optimized for the case that both the limit and
> > > the size are variable. With set_fs() gone, the limit is always constant,
> > > and the size usually is as well, so just using the default implementation
> > > reduces the check into a comparison against a constant that can be
> > > scheduled by the compiler.
> > >
> > > On a defconfig build, this saves over 28KB of .text.
> > >
> > > Signed-off-by: Arnd Bergmann 
> > > ---
> > >  arch/arm64/include/asm/uaccess.h | 28 +---
> > >  1 file changed, 5 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/uaccess.h 
> > > b/arch/arm64/include/asm/uaccess.h
> > > index 357f7bd9c981..e8dce0cc5eaa 100644
> > > --- a/arch/arm64/include/asm/uaccess.h
> > > +++ b/arch/arm64/include/asm/uaccess.h
> > > @@ -26,6 +26,8 @@
> > >  #include 
> > >  #include 
> > >
> > > +static inline int __access_ok(const void __user *ptr, unsigned long 
> > > size);
> > > +
> > >  /*
> > >   * Test whether a block of memory is a valid user space address.
> > >   * Returns 1 if the range is valid, 0 otherwise.
> > > @@ -33,10 +35,8 @@
> > >   * This is equivalent to the following test:
> > >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> > >   */
> > > -static inline unsigned long __access_ok(const void __user *addr, 
> > > unsigned long size)
> > > +static inline int access_ok(const void __user *addr, unsigned long size)
> > >  {
> > > -   unsigned long ret, limit = TASK_SIZE_MAX - 1;
> > > -
> > > /*
> > >  * Asynchronous I/O running in a kernel thread does not have the
> > >  * TIF_TAGGED_ADDR flag of the process owning the mm, so always 
> > > untag
> > > @@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void 
> > > __user *addr, unsigned long s
> > > (current->flags & PF_KTHREAD || 
> > > test_thread_flag(TIF_TAGGED_ADDR)))
> > > addr = untagged_addr(addr);
> > >
> > > -   __chk_user_ptr(addr);
> > > -   asm volatile(
> > > -   // A + B <= C + 1 for all A,B,C, in four easy steps:
> > > -   // 1: X = A + B; X' = X % 2^64
> > > -   "   adds%0, %3, %2\n"
> > > -   // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
> > > -   "   csel%1, xzr, %1, hi\n"
> > > -   // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
> > > -   //to compensate for the carry flag being set in step 4. For
> > > -   //X > 2^64, X' merely has to remain nonzero, which it does.
> > > -   "   csinv   %0, %0, xzr, cc\n"
> > > -   // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
> > > -   //comes from the carry in being clear. Otherwise, we are
> > > -   //testing X' - C == 0, subject to the previous adjustments.
> > > -   "   sbcsxzr, %0, %1\n"
> > > -   "   cset%0, ls\n"
> > > -   : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
> > > -
> > > -   return ret;
> > > +   return likely(__access_ok(addr, size));
> > >  }
> > > -#define __access_ok __access_ok
> > > +#define access_ok access_ok
> > >
> > >  #include 
> > >
> > > --
> > > 2.29.2
> > >
> > 
> > With set_fs() out of the picture, wouldn't it be sufficient to check
> > that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
> > That would also remove the need to strip the tag from the address.
> > 
> > Something like
> > 
> > asm goto("tbnz  %0, #55, %2 \n"
> >  "tbnz  %1, #55, %2 \n"
> >  :: "r"(addr), "r"(addr + size - 1) :: notok);
> > return 1;
> > notok:
> > return 0;
> > 
> > with an additional sanity check on the size which the compiler could
> > eliminate for compile-time constant values.
> 
> Is there are reason not to just use:
>   size < 1u << 48 && !((addr | (addr + size - 1)) & 1u << 55)

That has a few problems, including being an ABI change for tasks not using the
relaxed tag ABI and not working for 52-bit VAs.

If we really want to relax the tag checking aspect, there are simpler options,
including variations on Ard's approach above.

> Ugg, is arm64 addressing as horrid as it looks - with the 'kernel'
> bit in the middle of the virtual address space?

It's just sign-extension/canonical addressing, except bits [63:56] are
configurable between a few uses, so the achitecture says bit 55 is the one to
look at in all configurations to figure out if an address is high/low (in
addition to checking the remaining bits are canonical).

Thanks,
Mark.


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Mark Rutland
On Mon, Feb 14, 2022 at 05:34:46PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> arm64 has an inline asm implementation of access_ok() that is derived from
> the 32-bit arm version and optimized for the case that both the limit and
> the size are variable. With set_fs() gone, the limit is always constant,
> and the size usually is as well, so just using the default implementation
> reduces the check into a comparison against a constant that can be
> scheduled by the compiler.
> 
> On a defconfig build, this saves over 28KB of .text.
> 
> Signed-off-by: Arnd Bergmann 

I had a play around with this and a number of alternative options that had
previously been discussed (e.g. using uint128_t for the check to allow the
compiler to use the carry flag), and:

* Any sequences which we significantly simpler involved an ABI change (e.g. not
  checking tags for tasks not using the relaxed tag ABI), or didn't interact
  well with the uaccess pointer masking we do for speculation hardening.

* For all constant-size cases, this was joint-best for codegen.

* For variable-size cases the difference between options (which did not change
  ABI or break pointer masking) fell in the noise and really depended on what
  you were optimizing for.

This patch itself is clear, I believe the logic is sound and does not result in
a behavioural change, so for this as-is:

Acked-by: Mark Rutland 

As on other replies, I think that if we want to make further changes to this,
we should do that as follow-ups, since there are a number of subtleties in this
area w.r.t. tag management and speculation with potential ABI implications.

Thanks,
Mark.

> ---
>  arch/arm64/include/asm/uaccess.h | 28 +---
>  1 file changed, 5 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h 
> b/arch/arm64/include/asm/uaccess.h
> index 357f7bd9c981..e8dce0cc5eaa 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
>  
> +static inline int __access_ok(const void __user *ptr, unsigned long size);
> +
>  /*
>   * Test whether a block of memory is a valid user space address.
>   * Returns 1 if the range is valid, 0 otherwise.
> @@ -33,10 +35,8 @@
>   * This is equivalent to the following test:
>   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
>   */
> -static inline unsigned long __access_ok(const void __user *addr, unsigned 
> long size)
> +static inline int access_ok(const void __user *addr, unsigned long size)
>  {
> - unsigned long ret, limit = TASK_SIZE_MAX - 1;
> -
>   /*
>* Asynchronous I/O running in a kernel thread does not have the
>* TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
> @@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void __user 
> *addr, unsigned long s
>   (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
>   addr = untagged_addr(addr);
>  
> - __chk_user_ptr(addr);
> - asm volatile(
> - // A + B <= C + 1 for all A,B,C, in four easy steps:
> - // 1: X = A + B; X' = X % 2^64
> - "   adds%0, %3, %2\n"
> - // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
> - "   csel%1, xzr, %1, hi\n"
> - // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
> - //to compensate for the carry flag being set in step 4. For
> - //X > 2^64, X' merely has to remain nonzero, which it does.
> - "   csinv   %0, %0, xzr, cc\n"
> - // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
> - //comes from the carry in being clear. Otherwise, we are
> - //testing X' - C == 0, subject to the previous adjustments.
> - "   sbcsxzr, %0, %1\n"
> - "   cset%0, ls\n"
> - : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
> -
> - return ret;
> + return likely(__access_ok(addr, size));
>  }
> -#define __access_ok __access_ok
> +#define access_ok access_ok
>  
>  #include 
>  
> -- 
> 2.29.2
> 


Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :
>> Christophe Leroy wrote:
>>> Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
>>> of livepatching.
>>>
>>> Also note that powerpc being the last one to convert to
>>> CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
>>> klp_arch_set_pc() on all architectures.
>>>
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>>  arch/powerpc/Kconfig |  1 +
>>>  arch/powerpc/include/asm/ftrace.h    | 17 +
>>>  arch/powerpc/include/asm/livepatch.h |  4 +---
>>>  3 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index cdac2115eb00..e2b1792b2aae 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -210,6 +210,7 @@ config PPC
>>>  select HAVE_DEBUG_KMEMLEAK
>>>  select HAVE_DEBUG_STACKOVERFLOW
>>>  select HAVE_DYNAMIC_FTRACE
>>> +    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || PPC32
>>>  select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || PPC32
>>>  select HAVE_EBPF_JIT
>>>  select HAVE_EFFICIENT_UNALIGNED_ACCESS    if !(CPU_LITTLE_ENDIAN 
>>> && POWER7_CPU)
>>> diff --git a/arch/powerpc/include/asm/ftrace.h 
>>> b/arch/powerpc/include/asm/ftrace.h
>>> index b3f6184f77ea..45c3d6f11daa 100644
>>> --- a/arch/powerpc/include/asm/ftrace.h
>>> +++ b/arch/powerpc/include/asm/ftrace.h
>>> @@ -22,6 +22,23 @@ static inline unsigned long 
>>> ftrace_call_adjust(unsigned long addr)
>>>  struct dyn_arch_ftrace {
>>>  struct module *mod;
>>>  };
>>> +
>>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
>>> +struct ftrace_regs {
>>> +    struct pt_regs regs;
>>> +};
>>> +
>>> +static __always_inline struct pt_regs *arch_ftrace_get_regs(struct 
>>> ftrace_regs *fregs)
>>> +{
>>> +    return >regs;
>>> +}
>> 
>> I think this is wrong. We need to differentiate between ftrace_caller() 
>> and ftrace_regs_caller() here, and only return pt_regs if coming in 
>> through ftrace_regs_caller() (i.e., FL_SAVE_REGS is set).
>
> Not sure I follow you.
>
> This is based on 5740a7c71ab6 ("s390/ftrace: add 
> HAVE_DYNAMIC_FTRACE_WITH_ARGS support")
>
> It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs also 
> with ftrace_caller().
>
> Sure you only have the params, but that's the same on s390, so what did 
> I miss ?

I already have this series in next, I can pull it out, but I'd rather
not.

I'll leave it in for now, hopefully you two can agree overnight my time
whether this is a big problem or something we can fix with a fixup
patch.

>>> +static __always_inline void ftrace_instruction_pointer_set(struct 
>>> ftrace_regs *fregs,
>>> +   unsigned long ip)
>>> +{
>>> +    regs_set_return_ip(>regs, ip);
>> 
>> Should we use that helper here? regs_set_return_ip() also updates some 
>> other state related to taking interrupts and I don't think it makes 
>> sense for use with ftrace.
>
>
> Today we have:
>
>   static inline void klp_arch_set_pc(struct ftrace_regs *fregs, unsigned 
> long ip)
>   {
>   struct pt_regs *regs = ftrace_get_regs(fregs);
>
>   regs_set_return_ip(regs, ip);
>   }
>
>
> Which like x86 and s390 becomes:
>
>   static inline void klp_arch_set_pc(struct ftrace_regs *fregs, unsigned 
> long ip)
>   {
>   ftrace_instruction_pointer_set(fregs, ip);
>   }
>
>
>
> That's the reason why I've been using regs_set_return_ip(). Do you think 
> it was wrong to use regs_set_return_ip() in klp_arch_set_pc() ?
>
> That was added by 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR 
> registers if they are still valid")

It's not wrong, but I think it's unnecessary. We need to use
regs_set_return_ip() if we're changing the regs->ip of an interrupt
frame, so that the interrupt return code will reload it.

But AIUI in this case we're not doing that, we're changing the regs->ip
of a pt_regs provided by ftrace, which shouldn't ever be an interrupt
frame.

So it's not a bug to use regs_set_return_ip(), but it is unncessary and
means we'll reload the interrupt state unnecessarily on the next
interrupt return.

cheers


Re: [PATCH 07/14] uaccess: generalize access_ok()

2022-02-15 Thread Mark Rutland
On Mon, Feb 14, 2022 at 05:34:45PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> There are many different ways that access_ok() is defined across
> architectures, but in the end, they all just compare against the
> user_addr_max() value or they accept anything.
> 
> Provide one definition that works for most architectures, checking
> against TASK_SIZE_MAX for user processes or skipping the check inside
> of uaccess_kernel() sections.
> 
> For architectures without CONFIG_SET_FS(), this should be the fastest
> check, as it comes down to a single comparison of a pointer against a
> compile-time constant, while the architecture specific versions tend to
> do something more complex for historic reasons or get something wrong.
> 
> Type checking for __user annotations is handled inconsistently across
> architectures, but this is easily simplified as well by using an inline
> function that takes a 'const void __user *' argument. A handful of
> callers need an extra __user annotation for this.
> 
> Some architectures had trick to use 33-bit or 65-bit arithmetic on the
> addresses to calculate the overflow, however this simpler version uses
> fewer registers, which means it can produce better object code in the
> end despite needing a second (statically predicted) branch.
> 
> Signed-off-by: Arnd Bergmann 

As discussed over IRC, the generic sequence looks good to me, and likewise for
the arm64 change, so:

Acked-by: Mark Rutland  [arm64, asm-generic]

Thanks,
Mark.

> ---
>  arch/alpha/include/asm/uaccess.h  | 34 +++
>  arch/arc/include/asm/uaccess.h| 29 -
>  arch/arm/include/asm/uaccess.h| 20 +
>  arch/arm/kernel/swp_emulate.c |  2 +-
>  arch/arm/kernel/traps.c   |  2 +-
>  arch/arm64/include/asm/uaccess.h  |  5 ++-
>  arch/csky/include/asm/uaccess.h   |  8 
>  arch/csky/kernel/signal.c |  2 +-
>  arch/hexagon/include/asm/uaccess.h| 25 
>  arch/ia64/include/asm/uaccess.h   |  5 +--
>  arch/m68k/include/asm/uaccess.h   |  5 ++-
>  arch/microblaze/include/asm/uaccess.h |  8 +---
>  arch/mips/include/asm/uaccess.h   | 29 +
>  arch/nds32/include/asm/uaccess.h  |  7 +---
>  arch/nios2/include/asm/uaccess.h  | 11 +
>  arch/nios2/kernel/signal.c| 20 +
>  arch/openrisc/include/asm/uaccess.h   | 19 +
>  arch/parisc/include/asm/uaccess.h | 10 +++--
>  arch/powerpc/include/asm/uaccess.h| 11 +
>  arch/powerpc/lib/sstep.c  |  4 +-
>  arch/riscv/include/asm/uaccess.h  | 31 +-
>  arch/riscv/kernel/perf_callchain.c|  2 +-
>  arch/s390/include/asm/uaccess.h   | 11 ++---
>  arch/sh/include/asm/uaccess.h | 22 +-
>  arch/sparc/include/asm/uaccess.h  |  3 --
>  arch/sparc/include/asm/uaccess_32.h   | 18 ++--
>  arch/sparc/include/asm/uaccess_64.h   | 35 
>  arch/sparc/kernel/signal_32.c |  2 +-
>  arch/um/include/asm/uaccess.h |  5 ++-
>  arch/x86/include/asm/uaccess.h| 14 +--
>  arch/xtensa/include/asm/uaccess.h | 10 +
>  include/asm-generic/access_ok.h   | 59 +++
>  include/asm-generic/uaccess.h | 21 +-
>  include/linux/uaccess.h   |  7 
>  34 files changed, 130 insertions(+), 366 deletions(-)
>  create mode 100644 include/asm-generic/access_ok.h
> 
> diff --git a/arch/alpha/include/asm/uaccess.h 
> b/arch/alpha/include/asm/uaccess.h
> index 1b6f25efa247..82c5743fc9cd 100644
> --- a/arch/alpha/include/asm/uaccess.h
> +++ b/arch/alpha/include/asm/uaccess.h
> @@ -20,28 +20,7 @@
>  #define get_fs()  (current_thread_info()->addr_limit)
>  #define set_fs(x) (current_thread_info()->addr_limit = (x))
>  
> -#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)
> -
> -/*
> - * Is a address valid? This does a straightforward calculation rather
> - * than tests.
> - *
> - * Address valid if:
> - *  - "addr" doesn't have any high-bits set
> - *  - AND "size" doesn't have any high-bits set
> - *  - AND "addr+size-(size != 0)" doesn't have any high-bits set
> - *  - OR we are in kernel mode.
> - */
> -#define __access_ok(addr, size) ({   \
> - unsigned long __ao_a = (addr), __ao_b = (size); \
> - unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;\
> - (get_fs().seg & (__ao_a | __ao_b | __ao_end)) == 0; })
> -
> -#define access_ok(addr, size)\
> -({   \
> - __chk_user_ptr(addr);   \
> - __access_ok(((unsigned long)(addr)), (size));   \
> -})
> +#include 
>  
>  /*
>   * These are the main single-value transfer routines.  They automatically
> @@ -105,7 +84,7 @@ extern void __get_user_unknown(void);
>   long __gu_err = -EFAULT;\
>   unsigned long __gu_val = 0;   

Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Mark Rutland
On Tue, Feb 15, 2022 at 10:39:46AM +0100, Arnd Bergmann wrote:
> On Tue, Feb 15, 2022 at 10:21 AM Ard Biesheuvel  wrote:
> > On Tue, 15 Feb 2022 at 10:13, Arnd Bergmann  wrote:
> >
> > arm64 also has this leading up to the range check, and I think we'd no
> > longer need it:
> >
> > if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> > addr = untagged_addr(addr);
> 
> I suspect the expensive part here is checking the two flags, as 
> untagged_addr()
> seems to always just add a sbfx instruction. Would this work?
> 
> #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
> #define access_ok(ptr, size) __access_ok(untagged_addr(ptr), (size))
> #else // the else path is the default, this can be left out.
> #define access_ok(ptr, size) __access_ok((ptr), (size))
> #endif

This would be an ABI change, e.g. for tasks without TIF_TAGGED_ADDR.

I don't think we should change this as part of this series.

Thanks,
Mark.


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Mark Rutland
On Tue, Feb 15, 2022 at 10:21:16AM +0100, Ard Biesheuvel wrote:
> On Tue, 15 Feb 2022 at 10:13, Arnd Bergmann  wrote:
> >
> > On Tue, Feb 15, 2022 at 9:17 AM Ard Biesheuvel  wrote:
> > > On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
> > > > From: Arnd Bergmann 
> > > >
> > >
> > > With set_fs() out of the picture, wouldn't it be sufficient to check
> > > that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
> > > That would also remove the need to strip the tag from the address.
> > >
> > > Something like
> > >
> > > asm goto("tbnz  %0, #55, %2 \n"
> > >  "tbnz  %1, #55, %2 \n"
> > >  :: "r"(addr), "r"(addr + size - 1) :: notok);
> > > return 1;
> > > notok:
> > > return 0;
> > >
> > > with an additional sanity check on the size which the compiler could
> > > eliminate for compile-time constant values.
> >
> > That should work, but I don't see it as a clear enough advantage to
> > have a custom implementation. For the constant-size case, it probably
> > isn't better than a compiler-scheduled comparison against a
> > constant limit, but it does hurt maintainability when the next person
> > wants to change the behavior of access_ok() globally.
> >
> 
> arm64 also has this leading up to the range check, and I think we'd no
> longer need it:
> 
> if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> addr = untagged_addr(addr);
> 

ABI-wise, we aim to *reject* tagged pointers unless the task is using the
tagged addr ABI, so we need to retain both the untagging logic and the full
pointer check (to actually check the tag bits) unless we relax that ABI
decision generally (or go context-switch the TCR_EL1.TBI* bits).

Since that has subtle ABI implications, I don't think we should change that
within this series.

If we *did* relax things, we could just check bit 55 here, and unconditionally
clear that in uaccess_mask_ptr(), since LDTR/STTR should fault on kernel memory.
On parts with meltdown those might not fault until committed, and so we need
masking to avoid speculative access to a kernel pointer, and that requires the
prior explciit check.

Thanks,
Mark.


Re: [PATCH 03/14] nds32: fix access_ok() checks in get/put_user

2022-02-15 Thread Greg KH
On Tue, Feb 15, 2022 at 10:18:15AM +0100, Arnd Bergmann wrote:
> On Mon, Feb 14, 2022 at 6:01 PM Christoph Hellwig  wrote:
> >
> > On Mon, Feb 14, 2022 at 05:34:41PM +0100, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > > The get_user()/put_user() functions are meant to check for
> > > access_ok(), while the __get_user()/__put_user() functions
> > > don't.
> > >
> > > This broke in 4.19 for nds32, when it gained an extraneous
> > > check in __get_user(), but lost the check it needs in
> > > __put_user().
> >
> > Can we follow the lead of MIPS (which this was originally copied
> > from I think) and kill the pointless __get/put_user_check wrapper
> > that just obsfucate the code?
> 
> I had another look, but I think that would be a bigger change than
> I want to have in a fix for stable backports, as nds32 also uses
> the _check versions in __{get,put}_user_error.

Don't worry about stable backports first, get it correct and merged and
then worry about them if you really have to.

If someone cares about nds32 for stable kernels, they can do the
backport work :)

thanks,

greg k-h


Re: [PATCH 09/14] m68k: drop custom __access_ok()

2022-02-15 Thread Arnd Bergmann
On Tue, Feb 15, 2022 at 8:13 AM Al Viro  wrote:
> On Tue, Feb 15, 2022 at 07:29:42AM +0100, Christoph Hellwig wrote:
> > On Tue, Feb 15, 2022 at 12:37:41AM +, Al Viro wrote:
> > > Perhaps simply wrap that sucker into #ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
> > > (and trim the comment down to "coldfire and 68000 will pick generic
> > > variant")?
> >
> > I wonder if we should invert CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
> > select the separate address space config for s390, sparc64, non-coldfire
> > m68k and mips with EVA and then just have one single access_ok for
> > overlapping address space (as added by Arnd) and non-overlapping ones
> > (always return true).
>
> parisc is also such...  How about
>
> select ALTERNATE_SPACE_USERLAND
>
> for that bunch?

Either of those works for me. My current version has this keyed off
TASK_SIZE_MAX==ULONG_MAX, but a CONFIG_ symbol does
look more descriptive.

>  While we are at it, how many unusual access_ok() instances are
> left after this series?  arm64, itanic, um, anything else?

x86 adds a WARN_ON_IN_IRQ() check in there. This could be
made generic, but it's not obvious what exactly the exceptions are
that other architectures need. The arm64 tagged pointers could
probably also get integrated into the generic version.

> FWIW, sparc32 has a slightly unusual instance (see uaccess_32.h there); it's
> obviously cheaper than generic and I wonder if the trick is legitimate (and
> applicable elsewhere, perhaps)...

Right, a few others have the same, but I wasn't convinced that this
is actually safe for call possible cases: it's trivial to construct a caller
that works on other architectures but not this one, if you pass a large
enough size value and don't access the contents in sequence.

Also, like the ((addr | (addr + size)) & MASK) check on some other
architectures, it is less portable because it makes assumptions about
the actual layout beyond a fixed address limit.

Arnd


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Arnd Bergmann
On Tue, Feb 15, 2022 at 10:21 AM Ard Biesheuvel  wrote:
> On Tue, 15 Feb 2022 at 10:13, Arnd Bergmann  wrote:
>
> arm64 also has this leading up to the range check, and I think we'd no
> longer need it:
>
> if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> addr = untagged_addr(addr);

I suspect the expensive part here is checking the two flags, as untagged_addr()
seems to always just add a sbfx instruction. Would this work?

#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
#define access_ok(ptr, size) __access_ok(untagged_addr(ptr), (size))
#else // the else path is the default, this can be left out.
#define access_ok(ptr, size) __access_ok((ptr), (size))
#endif

   Arnd


Re: [PATCH v2] i2c: pasemi: Drop I2C classes from platform driver variant

2022-02-15 Thread Wolfram Sang
On Fri, Feb 04, 2022 at 10:59:14AM +0100, Martin Povišer wrote:
> Drop I2C device-probing classes from platform variant of the PASemi
> controller as it is only used on platforms where I2C devices should
> be instantiated in devicetree. (The I2C_CLASS_DEPRECATED flag is not
> raised as up to this point no devices relied on the old behavior.)
> 
> Fixes: d88ae2932df0 ("i2c: pasemi: Add Apple platform driver")
> Signed-off-by: Martin Povišer 

Applied to for-next, thanks!



signature.asc
Description: PGP signature


RE: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread David Laight
From: Ard Biesheuvel
> Sent: 15 February 2022 08:18
> 
> On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
> >
> > From: Arnd Bergmann 
> >
> > arm64 has an inline asm implementation of access_ok() that is derived from
> > the 32-bit arm version and optimized for the case that both the limit and
> > the size are variable. With set_fs() gone, the limit is always constant,
> > and the size usually is as well, so just using the default implementation
> > reduces the check into a comparison against a constant that can be
> > scheduled by the compiler.
> >
> > On a defconfig build, this saves over 28KB of .text.
> >
> > Signed-off-by: Arnd Bergmann 
> > ---
> >  arch/arm64/include/asm/uaccess.h | 28 +---
> >  1 file changed, 5 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/uaccess.h 
> > b/arch/arm64/include/asm/uaccess.h
> > index 357f7bd9c981..e8dce0cc5eaa 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -26,6 +26,8 @@
> >  #include 
> >  #include 
> >
> > +static inline int __access_ok(const void __user *ptr, unsigned long size);
> > +
> >  /*
> >   * Test whether a block of memory is a valid user space address.
> >   * Returns 1 if the range is valid, 0 otherwise.
> > @@ -33,10 +35,8 @@
> >   * This is equivalent to the following test:
> >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> >   */
> > -static inline unsigned long __access_ok(const void __user *addr, unsigned 
> > long size)
> > +static inline int access_ok(const void __user *addr, unsigned long size)
> >  {
> > -   unsigned long ret, limit = TASK_SIZE_MAX - 1;
> > -
> > /*
> >  * Asynchronous I/O running in a kernel thread does not have the
> >  * TIF_TAGGED_ADDR flag of the process owning the mm, so always 
> > untag
> > @@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void 
> > __user *addr, unsigned long s
> > (current->flags & PF_KTHREAD || 
> > test_thread_flag(TIF_TAGGED_ADDR)))
> > addr = untagged_addr(addr);
> >
> > -   __chk_user_ptr(addr);
> > -   asm volatile(
> > -   // A + B <= C + 1 for all A,B,C, in four easy steps:
> > -   // 1: X = A + B; X' = X % 2^64
> > -   "   adds%0, %3, %2\n"
> > -   // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
> > -   "   csel%1, xzr, %1, hi\n"
> > -   // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
> > -   //to compensate for the carry flag being set in step 4. For
> > -   //X > 2^64, X' merely has to remain nonzero, which it does.
> > -   "   csinv   %0, %0, xzr, cc\n"
> > -   // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
> > -   //comes from the carry in being clear. Otherwise, we are
> > -   //testing X' - C == 0, subject to the previous adjustments.
> > -   "   sbcsxzr, %0, %1\n"
> > -   "   cset%0, ls\n"
> > -   : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
> > -
> > -   return ret;
> > +   return likely(__access_ok(addr, size));
> >  }
> > -#define __access_ok __access_ok
> > +#define access_ok access_ok
> >
> >  #include 
> >
> > --
> > 2.29.2
> >
> 
> With set_fs() out of the picture, wouldn't it be sufficient to check
> that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
> That would also remove the need to strip the tag from the address.
> 
> Something like
> 
> asm goto("tbnz  %0, #55, %2 \n"
>  "tbnz  %1, #55, %2 \n"
>  :: "r"(addr), "r"(addr + size - 1) :: notok);
> return 1;
> notok:
> return 0;
> 
> with an additional sanity check on the size which the compiler could
> eliminate for compile-time constant values.

Is there are reason not to just use:
size < 1u << 48 && !((addr | (addr + size - 1)) & 1u << 55)

(The -1 can be removed if the last user page is never mapped)

Ugg, is arm64 addressing as horrid as it looks - with the 'kernel'
bit in the middle of the virtual address space?
It seems to be:

Although I found some references to 44 bit VA and to code using the
'ignored' bits as tags - relying on the hardware ignoring them.
There might be some feature that uses the top 4 bits as well.

Another option is assuming that accesses are 'reasonably sequential',
removing the length check and ensuring there is an unmapped page
between valid user and kernel addresses.
That probably requires and unmapped page at the bottom of kernel space
which may not be achievable.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Ard Biesheuvel
On Tue, 15 Feb 2022 at 10:13, Arnd Bergmann  wrote:
>
> On Tue, Feb 15, 2022 at 9:17 AM Ard Biesheuvel  wrote:
> > On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
> > > From: Arnd Bergmann 
> > >
> >
> > With set_fs() out of the picture, wouldn't it be sufficient to check
> > that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
> > That would also remove the need to strip the tag from the address.
> >
> > Something like
> >
> > asm goto("tbnz  %0, #55, %2 \n"
> >  "tbnz  %1, #55, %2 \n"
> >  :: "r"(addr), "r"(addr + size - 1) :: notok);
> > return 1;
> > notok:
> > return 0;
> >
> > with an additional sanity check on the size which the compiler could
> > eliminate for compile-time constant values.
>
> That should work, but I don't see it as a clear enough advantage to
> have a custom implementation. For the constant-size case, it probably
> isn't better than a compiler-scheduled comparison against a
> constant limit, but it does hurt maintainability when the next person
> wants to change the behavior of access_ok() globally.
>

arm64 also has this leading up to the range check, and I think we'd no
longer need it:

if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
(current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
addr = untagged_addr(addr);

> If we want to get into micro-optimizing uaccess, I think a better target
> would be a CONFIG_CC_HAS_ASM_GOTO_OUTPUT version
> of __get_user()/__put_user as we have on x86 and powerpc.
>
>  Arnd


Re: [PATCH 03/14] nds32: fix access_ok() checks in get/put_user

2022-02-15 Thread Arnd Bergmann
On Mon, Feb 14, 2022 at 6:01 PM Christoph Hellwig  wrote:
>
> On Mon, Feb 14, 2022 at 05:34:41PM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> > The get_user()/put_user() functions are meant to check for
> > access_ok(), while the __get_user()/__put_user() functions
> > don't.
> >
> > This broke in 4.19 for nds32, when it gained an extraneous
> > check in __get_user(), but lost the check it needs in
> > __put_user().
>
> Can we follow the lead of MIPS (which this was originally copied
> from I think) and kill the pointless __get/put_user_check wrapper
> that just obsfucate the code?

I had another look, but I think that would be a bigger change than
I want to have in a fix for stable backports, as nds32 also uses
the _check versions in __{get,put}_user_error.

If we instead clean it up in a separate patch, it should be done for
all eight architectures that do the same thing, but at that point,
the time seems better spent at coming up with a new set of
calling conventions that work with asm-goto.

 Arnd


Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Arnd Bergmann
On Tue, Feb 15, 2022 at 9:17 AM Ard Biesheuvel  wrote:
> On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
> > From: Arnd Bergmann 
> >
>
> With set_fs() out of the picture, wouldn't it be sufficient to check
> that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
> That would also remove the need to strip the tag from the address.
>
> Something like
>
> asm goto("tbnz  %0, #55, %2 \n"
>  "tbnz  %1, #55, %2 \n"
>  :: "r"(addr), "r"(addr + size - 1) :: notok);
> return 1;
> notok:
> return 0;
>
> with an additional sanity check on the size which the compiler could
> eliminate for compile-time constant values.

That should work, but I don't see it as a clear enough advantage to
have a custom implementation. For the constant-size case, it probably
isn't better than a compiler-scheduled comparison against a
constant limit, but it does hurt maintainability when the next person
wants to change the behavior of access_ok() globally.

If we want to get into micro-optimizing uaccess, I think a better target
would be a CONFIG_CC_HAS_ASM_GOTO_OUTPUT version
of __get_user()/__put_user as we have on x86 and powerpc.

 Arnd


Re: [PATCH v2 12/13] powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32

2022-02-15 Thread Christophe Leroy


Le 14/02/2022 à 18:51, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>> PPC64 mprofile versions and PPC32 are very similar.
>>
>> Modify PPC64 version so that if can be reused for PPC32.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>  .../powerpc/kernel/trace/ftrace_64_mprofile.S | 73 +--
>>  1 file changed, 51 insertions(+), 22 deletions(-)
> 
> While I agree that ppc32 and -mprofile-kernel ftrace code are very 
> similar, I think this patch adds way too many #ifdefs. IMHO, this
> makes the resultant code quite difficult to follow.

Ok, I can introduce some GAS macros for a few of them in a followup patch.

Christophe

Re: [PATCH 08/14] arm64: simplify access_ok()

2022-02-15 Thread Ard Biesheuvel
On Mon, 14 Feb 2022 at 17:37, Arnd Bergmann  wrote:
>
> From: Arnd Bergmann 
>
> arm64 has an inline asm implementation of access_ok() that is derived from
> the 32-bit arm version and optimized for the case that both the limit and
> the size are variable. With set_fs() gone, the limit is always constant,
> and the size usually is as well, so just using the default implementation
> reduces the check into a comparison against a constant that can be
> scheduled by the compiler.
>
> On a defconfig build, this saves over 28KB of .text.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/uaccess.h | 28 +---
>  1 file changed, 5 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/include/asm/uaccess.h 
> b/arch/arm64/include/asm/uaccess.h
> index 357f7bd9c981..e8dce0cc5eaa 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
>
> +static inline int __access_ok(const void __user *ptr, unsigned long size);
> +
>  /*
>   * Test whether a block of memory is a valid user space address.
>   * Returns 1 if the range is valid, 0 otherwise.
> @@ -33,10 +35,8 @@
>   * This is equivalent to the following test:
>   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
>   */
> -static inline unsigned long __access_ok(const void __user *addr, unsigned 
> long size)
> +static inline int access_ok(const void __user *addr, unsigned long size)
>  {
> -   unsigned long ret, limit = TASK_SIZE_MAX - 1;
> -
> /*
>  * Asynchronous I/O running in a kernel thread does not have the
>  * TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
> @@ -46,27 +46,9 @@ static inline unsigned long __access_ok(const void __user 
> *addr, unsigned long s
> (current->flags & PF_KTHREAD || 
> test_thread_flag(TIF_TAGGED_ADDR)))
> addr = untagged_addr(addr);
>
> -   __chk_user_ptr(addr);
> -   asm volatile(
> -   // A + B <= C + 1 for all A,B,C, in four easy steps:
> -   // 1: X = A + B; X' = X % 2^64
> -   "   adds%0, %3, %2\n"
> -   // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4
> -   "   csel%1, xzr, %1, hi\n"
> -   // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X'
> -   //to compensate for the carry flag being set in step 4. For
> -   //X > 2^64, X' merely has to remain nonzero, which it does.
> -   "   csinv   %0, %0, xzr, cc\n"
> -   // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1
> -   //comes from the carry in being clear. Otherwise, we are
> -   //testing X' - C == 0, subject to the previous adjustments.
> -   "   sbcsxzr, %0, %1\n"
> -   "   cset%0, ls\n"
> -   : "=" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc");
> -
> -   return ret;
> +   return likely(__access_ok(addr, size));
>  }
> -#define __access_ok __access_ok
> +#define access_ok access_ok
>
>  #include 
>
> --
> 2.29.2
>

With set_fs() out of the picture, wouldn't it be sufficient to check
that bit #55 is clear? (the bit that selects between TTBR0 and TTBR1)
That would also remove the need to strip the tag from the address.

Something like

asm goto("tbnz  %0, #55, %2 \n"
 "tbnz  %1, #55, %2 \n"
 :: "r"(addr), "r"(addr + size - 1) :: notok);
return 1;
notok:
return 0;

with an additional sanity check on the size which the compiler could
eliminate for compile-time constant values.


Re: [PATCH 14/14] uaccess: drop set_fs leftovers

2022-02-15 Thread Arnd Bergmann
On Tue, Feb 15, 2022 at 8:46 AM Helge Deller  wrote:
>
> On 2/15/22 04:03, Al Viro wrote:
> > On Mon, Feb 14, 2022 at 05:34:52PM +0100, Arnd Bergmann wrote:
> >> diff --git a/arch/parisc/include/asm/futex.h 
> >> b/arch/parisc/include/asm/futex.h
> >> index b5835325d44b..2f4a1b1ef387 100644
> >> --- a/arch/parisc/include/asm/futex.h
> >> +++ b/arch/parisc/include/asm/futex.h
> >> @@ -99,7 +99,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user 
> >> *uaddr,
> >>  /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
> >>   * our gateway page, and causes no end of trouble...
> >>   */
> >> -if (uaccess_kernel() && !uaddr)
> >> +if (!uaddr)
> >>  return -EFAULT;
> >
> >   Huh?  uaccess_kernel() is removed since it becomes always false now,
> > so this looks odd.
> >
> >   AFAICS, the comment above that check refers to futex_detect_cmpxchg()
> > -> cmpxchg_futex_value_locked() -> futex_atomic_cmpxchg_inatomic() call 
> > chain.
> > Which had been gone since commit 3297481d688a (futex: Remove futex_cmpxchg
> > detection).  The comment *and* the check should've been killed off back
> > then.
> >   Let's make sure to get both now...
>
> Right. Arnd, can you drop this if() and the comment above it?

Done.

   Arnd


Re: [PATCH v2 09/13] powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS

2022-02-15 Thread Christophe Leroy


Le 14/02/2022 à 16:25, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>> Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call
>> of livepatching.
>>
>> Also note that powerpc being the last one to convert to
>> CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove
>> klp_arch_set_pc() on all architectures.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>  arch/powerpc/Kconfig |  1 +
>>  arch/powerpc/include/asm/ftrace.h    | 17 +
>>  arch/powerpc/include/asm/livepatch.h |  4 +---
>>  3 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index cdac2115eb00..e2b1792b2aae 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -210,6 +210,7 @@ config PPC
>>  select HAVE_DEBUG_KMEMLEAK
>>  select HAVE_DEBUG_STACKOVERFLOW
>>  select HAVE_DYNAMIC_FTRACE
>> +    select HAVE_DYNAMIC_FTRACE_WITH_ARGS    if MPROFILE_KERNEL || PPC32
>>  select HAVE_DYNAMIC_FTRACE_WITH_REGS    if MPROFILE_KERNEL || PPC32
>>  select HAVE_EBPF_JIT
>>  select HAVE_EFFICIENT_UNALIGNED_ACCESS    if !(CPU_LITTLE_ENDIAN 
>> && POWER7_CPU)
>> diff --git a/arch/powerpc/include/asm/ftrace.h 
>> b/arch/powerpc/include/asm/ftrace.h
>> index b3f6184f77ea..45c3d6f11daa 100644
>> --- a/arch/powerpc/include/asm/ftrace.h
>> +++ b/arch/powerpc/include/asm/ftrace.h
>> @@ -22,6 +22,23 @@ static inline unsigned long 
>> ftrace_call_adjust(unsigned long addr)
>>  struct dyn_arch_ftrace {
>>  struct module *mod;
>>  };
>> +
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
>> +struct ftrace_regs {
>> +    struct pt_regs regs;
>> +};
>> +
>> +static __always_inline struct pt_regs *arch_ftrace_get_regs(struct 
>> ftrace_regs *fregs)
>> +{
>> +    return >regs;
>> +}
> 
> I think this is wrong. We need to differentiate between ftrace_caller() 
> and ftrace_regs_caller() here, and only return pt_regs if coming in 
> through ftrace_regs_caller() (i.e., FL_SAVE_REGS is set).

Not sure I follow you.

This is based on 5740a7c71ab6 ("s390/ftrace: add 
HAVE_DYNAMIC_FTRACE_WITH_ARGS support")

It's all the point of HAVE_DYNAMIC_FTRACE_WITH_ARGS, have the regs also 
with ftrace_caller().

Sure you only have the params, but that's the same on s390, so what did 
I miss ?


> 
>> +
>> +static __always_inline void ftrace_instruction_pointer_set(struct 
>> ftrace_regs *fregs,
>> +   unsigned long ip)
>> +{
>> +    regs_set_return_ip(>regs, ip);
> 
> Should we use that helper here? regs_set_return_ip() also updates some 
> other state related to taking interrupts and I don't think it makes 
> sense for use with ftrace.
> 


Today we have:

static inline void klp_arch_set_pc(struct ftrace_regs *fregs, unsigned 
long ip)
{
struct pt_regs *regs = ftrace_get_regs(fregs);

regs_set_return_ip(regs, ip);
}


Which like x86 and s390 becomes:

static inline void klp_arch_set_pc(struct ftrace_regs *fregs, unsigned 
long ip)
{
ftrace_instruction_pointer_set(fregs, ip);
}



That's the reason why I've been using regs_set_return_ip(). Do you think 
it was wrong to use regs_set_return_ip() in klp_arch_set_pc() ?

That was added by 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR 
registers if they are still valid")

Christophe