from:"Will Deacon"

Re: [PATCH v3 1/2] drivers/perf: xgene_pmu: Fix uninitialized resource struct

2020-09-18 Thread Will Deacon

On Tue, 15 Sep 2020 16:41:09 -0400, Mark Salter wrote:
> This splat was reported on newer Fedora kernels booting on certain
> X-gene based machines:
> 
>  xgene-pmu APMC0D83:00: X-Gene PMU version 3
>  Unable to handle kernel read from unreadable memory at virtual \
>  address 4006
>  ...
>  Call trace:
>   string+0x50/0x100
>   vsnprintf+0x160/0x750
>   devm_kvasprintf+0x5c/0xb4
>   devm_kasprintf+0x54/0x60
>   __devm_ioremap_resource+0xdc/0x1a0
>   devm_ioremap_resource+0x14/0x20
>   acpi_get_pmu_hw_inf.isra.0+0x84/0x15c
>   acpi_pmu_dev_add+0xbc/0x21c
>   acpi_ns_walk_namespace+0x16c/0x1e4
>   acpi_walk_namespace+0xb4/0xfc
>   xgene_pmu_probe_pmu_dev+0x7c/0xe0
>   xgene_pmu_probe.part.0+0x2c0/0x310
>   xgene_pmu_probe+0x54/0x64
>   platform_drv_probe+0x60/0xb4
>   really_probe+0xe8/0x4a0
>   driver_probe_device+0xe4/0x100
>   device_driver_attach+0xcc/0xd4
>   __driver_attach+0xb0/0x17c
>   bus_for_each_dev+0x6c/0xb0
>   driver_attach+0x30/0x40
>   bus_add_driver+0x154/0x250
>   driver_register+0x84/0x140
>   __platform_driver_register+0x54/0x60
>   xgene_pmu_driver_init+0x28/0x34
>   do_one_initcall+0x40/0x204
>   do_initcalls+0x104/0x144
>   kernel_init_freeable+0x198/0x210
>   kernel_init+0x20/0x12c
>   ret_from_fork+0x10/0x18
>  Code: 91000400 110004e1 eb08009f 54c0 (38646846)
>  ---[ end trace f08c10566496a703 ]---
> 
> [...]

Applied to will (for-next/perf), thanks!

[1/2] drivers/perf: xgene_pmu: Fix uninitialized resource struct
  https://git.kernel.org/will/c/a76b8236edcf
[2/2] drivers/perf: thunderx2_pmu: Fix memory resource error handling
  https://git.kernel.org/will/c/688494a407d1

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v2] arm64: Enable PCI write-combine resources under sysfs

2020-09-18 Thread Will Deacon

On Fri, 18 Sep 2020 03:33:12 +, Clint Sbisa wrote:
> This change exposes write-combine mappings under sysfs for
> prefetchable PCI resources on arm64.
> 
> Originally, the usage of "write combine" here was driven by the x86
> definition of write combine. This definition is specific to x86 and
> does not generalize to other architectures. However, the usage of WC
> has mutated to "write combine" semantics, which is implemented
> differently on each arch.
> 
> [...]

Applied to arm64 (for-next/pci), thanks!

[1/1] arm64: Enable PCI write-combine resources under sysfs
  https://git.kernel.org/arm64/c/5fd39dc22027

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v3 0/4] kselftests/arm64: add PAuth tests

2020-09-18 Thread Will Deacon

On Fri, 18 Sep 2020 11:47:11 +0100, Boyan Karatotev wrote:
> Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
> It introduces instructions to sign addresses and later check for potential
> corruption using a second modifier value and one of a set of keys. The
> signature, in the form of the Pointer Authentication Code (PAC), is stored
> in some of the top unused bits of the virtual address (e.g. [54: 49] if
> TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
> controls are present to enable/disable groups of instructions (which use
> certain keys) for compatibility with libraries that do not utilize the
> feature. PAuth is used to verify the integrity of return addresses on the
> stack with less memory than the stack canary.
> 
> [...]

Applied to arm64 (for-next/selftests), thanks!

[1/4] kselftests/arm64: add a basic Pointer Authentication test
  https://git.kernel.org/arm64/c/e74e1d557285
[2/4] kselftests/arm64: add nop checks for PAuth tests
  https://git.kernel.org/arm64/c/766d95b1ed93
[3/4] kselftests/arm64: add PAuth test for whether exec() changes keys
  https://git.kernel.org/arm64/c/806a15b2545e
[4/4] kselftests/arm64: add PAuth tests for single threaded consistency and 
differently initialized keys
  https://git.kernel.org/arm64/c/d21435e9670b

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v3 0/3] arm64: Convert to ARCH_STACKWALK

2020-09-18 Thread Will Deacon

On Mon, 14 Sep 2020 16:34:06 +0100, Mark Brown wrote:
> This series updates the arm64 stacktrace code to use the newer and much
> simpler arch_stack_walk() interface, the main benefit being a single
> entry point to the arch code with no need for the arch code to worry
> about skipping frames. Along the way I noticed that the reliable
> parameter to the arch_stack_walk() callback appears to be redundant
> so there's also a patch here removing that from the existing code to
> simplify the interface.
> 
> [...]

Applied to arm64 (for-next/stacktrace), thanks!

[1/3] stacktrace: Remove reliable argument from arch_stack_walk() callback
  https://git.kernel.org/arm64/c/264c03a245de
[2/3] arm64: stacktrace: Make stack walk callback consistent with generic code
  https://git.kernel.org/arm64/c/baa2cd417053
[3/3] arm64: stacktrace: Convert to ARCH_STACKWALK
  https://git.kernel.org/arm64/c/5fc57df2f6fd

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] arm64: ipi_teardown() should depend on HOTPLUG_CPU

2020-09-18 Thread Will Deacon

On Sat, Sep 19, 2020 at 12:35:48AM +0900, Sergey Senozhatsky wrote:
> ipi_teardown() is used only when CONFIG_HOTPLUG_CPU is set.
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  arch/arm64/kernel/smp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

I think this is a duplicate of:

  https://lore.kernel.org/r/20200918123318.23764-1-yuehaib...@huawei.com

which Marc is aware of (and I'm assuming he'll fix it in his series).

Will

Re: [PATCH v3] KVM: arm64: fix doc warnings in mmu code

2020-09-18 Thread Will Deacon

On Thu, Sep 17, 2020 at 09:47:49AM +0800, Xiaofei Tan wrote:
> Fix following warnings caused by mismatch bewteen function parameters
> and comments.
> arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'mmu' not 
> described in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'may_block' 
> not described in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:128: warning: Excess function parameter 'kvm' 
> description in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:499: warning: Function parameter or member 'writable' 
> not described in 'kvm_phys_addr_ioremap'
> arch/arm64/kvm/mmu.c:538: warning: Function parameter or member 'mmu' not 
> described in 'stage2_wp_range'
> arch/arm64/kvm/mmu.c:538: warning: Excess function parameter 'kvm' 
> description in 'stage2_wp_range'
> 
> Signed-off-by: Xiaofei Tan 
> ---
>  arch/arm64/kvm/mmu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Acked-by: Will Deacon 

Will

Re: [PATCH v2] Documentation: Chinese translation of Documentation/arm64/amu.rst

2020-09-18 Thread Will Deacon

On Fri, Sep 18, 2020 at 02:21:36AM -0700, Bailu Lin wrote:
> This is a Chinese translated version of Documentation/arm64/amu.rst
> 
> Signed-off-by: Bailu Lin 
> ---
> Changes in v2:
>  - Add index to arm64 directoy.
>  - Fix a document format error.
>  - Correct email encoding format.
> ---
>  Documentation/arm64/amu.rst   |   4 +
>  .../translations/zh_CN/arm64/amu.rst  | 102 ++
>  .../translations/zh_CN/arm64/index.rst|   2 +
>  3 files changed, 108 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/arm64/amu.rst

I'm supportive of translations for our documentation, but I can't really
review this! Assuming it doesn't say anything rude, then I'll leave it
for Jon to pick up.

Will

Re: [PATCH v3] doc: zh_CN: index files in arm64 subdirectory

2020-09-18 Thread Will Deacon

On Fri, Sep 18, 2020 at 01:11:26AM -0700, Bailu Lin wrote:
> Add arm64 subdirectory into the table of Contents for zh_CN,
> then add other translations in arm64 conveniently.
> 
> Signed-off-by: Bailu Lin 
> ---
> Changes in v3:
>  - Correct email encoding format.
> Changes in v2:
>  - Fix patch description.
> ---
>  Documentation/arm64/index.rst|  4 
>  Documentation/translations/zh_CN/arm64/index.rst | 16 
>  Documentation/translations/zh_CN/index.rst   |  1 +
>  3 files changed, 21 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/arm64/index.rst

Acked-by: Will Deacon 

I'm assuming Jon will pick this one up.

Cheers,

Will

Re: [PATCH v3 10/11] kvm: arm64: Set up hyp percpu data for nVHE

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:38PM +0100, David Brazdil wrote:
> Add hyp percpu section to linker script and rename the corresponding ELF
> sections of hyp/nvhe object files. This moves all nVHE-specific percpu
> variables to the new hyp percpu section.
> 
> Allocate sufficient amount of memory for all percpu hyp regions at global KVM
> init time and create corresponding hyp mappings.
> 
> The base addresses of hyp percpu regions are kept in a dynamically allocated
> array in the kernel.
> 
> Add NULL checks in PMU event-reset code as it may run before KVM memory is
> initialized.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_asm.h  | 19 +--
>  arch/arm64/include/asm/sections.h |  1 +
>  arch/arm64/kernel/vmlinux.lds.S   |  9 +
>  arch/arm64/kvm/arm.c  | 56 +--
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  6 
>  arch/arm64/kvm/pmu.c  |  5 ++-
>  6 files changed, 90 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index abc03f386b40..e0e1e404f6eb 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -66,8 +66,23 @@
>  #define CHOOSE_VHE_SYM(sym)  sym
>  #define CHOOSE_NVHE_SYM(sym) kvm_nvhe_sym(sym)
>  
> -#define this_cpu_ptr_nvhe(sym)   this_cpu_ptr(_nvhe_sym(sym))
> -#define per_cpu_ptr_nvhe(sym, cpu)   per_cpu_ptr(_nvhe_sym(sym), cpu)
> +/* Array of percpu base addresses. Length of the array is nr_cpu_ids. */
> +extern unsigned long *kvm_arm_hyp_percpu_base;
> +
> +/*
> + * Compute pointer to a symbol defined in nVHE percpu region.
> + * Returns NULL if percpu memory has not been allocated yet.
> + */
> +#define this_cpu_ptr_nvhe(sym)   per_cpu_ptr_nvhe(sym, 
> smp_processor_id())

Don't you run into similar problems here with the pcpu accessors when
CONFIG_DEBUG_PREEMPT=y? I'm worried we can end up with an unresolved
reference to debug_smp_processor_id().

> +#define per_cpu_ptr_nvhe(sym, cpu)   \
> + ({  \
> + unsigned long base, off;\
> + base = kvm_arm_hyp_percpu_base  \
> + ? kvm_arm_hyp_percpu_base[cpu] : 0; \
> + off = (unsigned long)_nvhe_sym(sym) -   \
> +   (unsigned long)_nvhe_sym(__per_cpu_start);\
> + base ? (typeof(kvm_nvhe_sym(sym))*)(base + off) : NULL; \
> + })
>  
>  #ifndef __KVM_NVHE_HYPERVISOR__
>  /*
> diff --git a/arch/arm64/include/asm/sections.h 
> b/arch/arm64/include/asm/sections.h
> index 3994169985ef..5062553a6847 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -18,5 +18,6 @@ extern char __exittext_begin[], __exittext_end[];
>  extern char __irqentry_text_start[], __irqentry_text_end[];
>  extern char __mmuoff_data_start[], __mmuoff_data_end[];
>  extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
> +extern char __kvm_nvhe___per_cpu_start[], __kvm_nvhe___per_cpu_end[];
>  
>  #endif /* __ASM_SECTIONS_H */
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 0d543570e811..d52e6b5dbfd3 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -9,6 +9,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -27,8 +28,15 @@ jiffies = jiffies_64;
>   __start___kvm_ex_table = .; \
>   *(__kvm_ex_table)   \
>   __stop___kvm_ex_table = .;
> +
> +#define HYPERVISOR_PERCPU_SECTION\
> + . = ALIGN(PAGE_SIZE);   \
> + HYP_SECTION_NAME(.data..percpu) : { \
> + *(HYP_SECTION_NAME(.data..percpu))  \
> + }
>  #else /* CONFIG_KVM */
>  #define HYPERVISOR_EXTABLE
> +#define HYPERVISOR_PERCPU_SECTION
>  #endif
>  
>  #define HYPERVISOR_TEXT  \
> @@ -194,6 +202,7 @@ SECTIONS
>   }
>  
>   PERCPU_SECTION(L1_CACHE_BYTES)
> + HYPERVISOR_PERCPU_SECTION
>  
>   .rela.dyn : ALIGN(8) {
>   *(.rela .rela*)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 58d7d614519b..8293319a32e7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -47,6 +47,7 @@ __asm__(".arch_extensionvirt");
>  #endif
>  
>  static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> +unsigned long *kvm_arm_hyp_percpu_base;
>  
>  /* The VMID used in the VTTBR */
>  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
> @@ -1258,6 +1259,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   }
>  }
>  
> +#define kvm_hyp_percpu_base(cpu) ((unsigned 
> long)per_cpu_ptr_nvhe(__per_cpu_start, cpu))

Re: [PATCH v3 09/11] kvm: arm64: Mark hyp stack pages reserved

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:37PM +0100, David Brazdil wrote:
> In preparation for unmapping hyp pages from host stage-2, allocate/free hyp
> stack using new helpers which automatically mark the pages reserved.

Given that this series doesn't get us that the point of having a stage-2 for
the host, I don't see why we need this part yet.

Will

Re: [PATCH v3 07/11] kvm: arm64: Duplicate arm64_ssbd_callback_required for nVHE hyp

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:35PM +0100, David Brazdil wrote:
> Hyp keeps track of which cores require SSBD callback by accessing a
> kernel-proper global variable. Create an nVHE symbol of the same name
> and copy the value from kernel proper to nVHE at KVM init time.
> 
> Done in preparation for separating percpu memory owned by kernel
> proper and nVHE.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 10 +++---
>  arch/arm64/kernel/image-vars.h   |  1 -
>  arch/arm64/kvm/arm.c |  2 +-
>  arch/arm64/kvm/hyp/nvhe/switch.c |  3 +++
>  4 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 189839c3706a..9db93da35606 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -529,23 +529,27 @@ static inline int kvm_map_vectors(void)
>  
>  #ifdef CONFIG_ARM64_SSBD
>  DECLARE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
> +DECLARE_KVM_NVHE_PER_CPU(u64, arm64_ssbd_callback_required);
>  
> -static inline int hyp_map_aux_data(void)
> +static inline int hyp_init_aux_data(void)
>  {
>   int cpu, err;
>  
>   for_each_possible_cpu(cpu) {
>   u64 *ptr;
>  
> - ptr = per_cpu_ptr(_ssbd_callback_required, cpu);
> + ptr = per_cpu_ptr_nvhe(arm64_ssbd_callback_required, cpu);
>   err = create_hyp_mappings(ptr, ptr + 1, PAGE_HYP);
>   if (err)
>   return err;
> +
> + /* Copy value from kernel to hyp. */
> + *ptr = per_cpu(arm64_ssbd_callback_required, cpu);

Hmm. Is this correct for late arriving CPUs, where we don't know whether
a callback is required at the point we do the copy?

That sounds fiddly to resolve, but this _might_ all be moot because I'm
about to post a series that allows us to remove the hyp mapping of this
variable entirely. So leave this for now, but maybe stick a comment in
that it doesn't work for late cpus.

Will

Re: [PATCH v3 08/11] kvm: arm64: Create separate instances of kvm_host_data for VHE/nVHE

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:36PM +0100, David Brazdil wrote:
> Host CPU context is stored in a global per-cpu variable `kvm_host_data`.
> In preparation for introducing independent per-CPU region for nVHE hyp,
> create two separate instances of `kvm_host_data`, one for VHE and one
> for nVHE.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_host.h | 2 +-
>  arch/arm64/kernel/image-vars.h| 1 -
>  arch/arm64/kvm/arm.c  | 5 ++---
>  arch/arm64/kvm/hyp/nvhe/switch.c  | 3 +++
>  arch/arm64/kvm/hyp/vhe/switch.c   | 3 +++
>  arch/arm64/kvm/pmu.c  | 8 
>  6 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 905c2b87e05a..5d8c63f5e97e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -565,7 +565,7 @@ void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> -DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data);
> +DECLARE_KVM_HYP_PER_CPU(kvm_host_data_t, kvm_host_data);
>  
>  static inline void kvm_init_host_cpu_context(struct kvm_cpu_context 
> *cpu_ctxt)
>  {
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 59d12a0b4622..80da861b8180 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -67,7 +67,6 @@ KVM_NVHE_ALIAS(kvm_patch_vector_branch);
>  KVM_NVHE_ALIAS(kvm_update_va_mask);
>  
>  /* Global kernel state accessed by nVHE hyp code. */
> -KVM_NVHE_ALIAS(kvm_host_data);
>  KVM_NVHE_ALIAS(kvm_vgic_global_state);
>  
>  /* Kernel constant needed to compute idmap addresses. */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 3bdc2661d276..7af9809fa193 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -46,7 +46,6 @@
>  __asm__(".arch_extension virt");
>  #endif
>  
> -DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data);
>  static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>  
>  /* The VMID used in the VTTBR */
> @@ -1308,7 +1307,7 @@ static void cpu_hyp_reset(void)
>  
>  static void cpu_hyp_reinit(void)
>  {
> - kvm_init_host_cpu_context(_cpu_ptr(_host_data)->host_ctxt);
> + kvm_init_host_cpu_context(_cpu_ptr_hyp(kvm_host_data)->host_ctxt);
>  
>   cpu_hyp_reset();
>  
> @@ -1543,7 +1542,7 @@ static int init_hyp_mode(void)
>   for_each_possible_cpu(cpu) {
>   kvm_host_data_t *cpu_data;
>  
> - cpu_data = per_cpu_ptr(_host_data, cpu);
> +     cpu_data = per_cpu_ptr_hyp(kvm_host_data, cpu);

I stand by my earlier comment to add _sym here, given that the ampersand
gets dropped from the argument.

So assuming you do that in the earlier patch:

Acked-by: Will Deacon 

Will

Re: [PATCH v3 06/11] kvm: arm64: Add helpers for accessing nVHE hyp per-cpu vars

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:34PM +0100, David Brazdil wrote:
> Defining a per-CPU variable in hyp/nvhe will result in its name being
> prefixed with __kvm_nvhe_. Add helpers for declaring these variables
> in kernel proper and accessing them with this_cpu_ptr and per_cpu_ptr.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_asm.h | 25 +++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index cf9456663289..abc03f386b40 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -54,9 +54,21 @@
>   DECLARE_KVM_VHE_SYM(sym);   \
>   DECLARE_KVM_NVHE_SYM(sym)
>  
> +#define DECLARE_KVM_VHE_PER_CPU(type, sym)   \
> + DECLARE_PER_CPU(type, sym)
> +#define DECLARE_KVM_NVHE_PER_CPU(type, sym)  \
> + DECLARE_PER_CPU(type, kvm_nvhe_sym(sym))
> +
> +#define DECLARE_KVM_HYP_PER_CPU(type, sym)   \
> + DECLARE_KVM_VHE_PER_CPU(type, sym); \
> + DECLARE_KVM_NVHE_PER_CPU(type, sym)
> +
>  #define CHOOSE_VHE_SYM(sym)  sym
>  #define CHOOSE_NVHE_SYM(sym) kvm_nvhe_sym(sym)
>  
> +#define this_cpu_ptr_nvhe(sym)   this_cpu_ptr(_nvhe_sym(sym))
> +#define per_cpu_ptr_nvhe(sym, cpu)   per_cpu_ptr(_nvhe_sym(sym), cpu)

nit: I'd probably stick a _sym suffix on these macros, to make it clear
that they're just munging the symbol name rather than doing some completely
different pcpu implementation.

THat said, do you expect these to be used outside of the pcpu
implementation? If not, I suggest some underscores as a prefix as well.

>  #ifndef __KVM_NVHE_HYPERVISOR__
>  /*
>   * BIG FAT WARNINGS:
> @@ -69,12 +81,21 @@
>   * - Don't let the nVHE hypervisor have access to this, as it will
>   *   pick the *wrong* symbol (yes, it runs at EL2...).
>   */
> -#define CHOOSE_HYP_SYM(sym)  (is_kernel_in_hyp_mode() ? CHOOSE_VHE_SYM(sym) \
> +#define CHOOSE_HYP_SYM(sym)  (is_kernel_in_hyp_mode()\
> +? CHOOSE_VHE_SYM(sym)\
>  : CHOOSE_NVHE_SYM(sym))
> +#define this_cpu_ptr_hyp(sym)(is_kernel_in_hyp_mode()
> \
> +? this_cpu_ptr() \
> +: this_cpu_ptr_nvhe(sym))
> +#define per_cpu_ptr_hyp(sym, cpu)(is_kernel_in_hyp_mode()\
> +? per_cpu_ptr(, cpu) \
> +: per_cpu_ptr_nvhe(sym, cpu))

is_kernel_in_hyp_mode() reads a system register to determine the current
exception level, so this doesn't seem like something we should be doing
everytime here. Perhaps is_kernel_in_hyp_mode() should avoid read_sysreg()
and instead use a non-volatile asm to allow the result to be cached by
the compiler. Hmm.

But I think that can be tackled as a future patch, so with the naming nits
resolved:

Acked-by: Will Deacon 

Will

Re: [PATCH v3 05/11] kvm: arm64: Remove hyp_adr/ldr_this_cpu

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:33PM +0100, David Brazdil wrote:
> The hyp_adr/ldr_this_cpu helpers were introduced for use in hyp code
> because they always needed to use TPIDR_EL2 for base, while
> adr/ldr_this_cpu from kernel proper would select between TPIDR_EL2 and
> _EL1 based on VHE/nVHE.
> 
> Simplify this now that the hyp mode case can be handled using the
> __KVM_VHE/NVHE_HYPERVISOR__ macros.
> 
> Acked-by: Andrew Scull 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/assembler.h | 27 +--
>  arch/arm64/include/asm/kvm_asm.h   | 14 +-
>  arch/arm64/kvm/hyp/hyp-entry.S |  2 +-
>  3 files changed, 19 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index 54d181177656..f79231a0f949 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -218,6 +218,21 @@ lr   .reqx30 // link register
>   str \src, [\tmp, :lo12:\sym]
>   .endm
>  
> + /*
> +  * @dst: destination register (32 or 64 bit wide)
> +  */
> + .macro  this_cpu_offset, dst
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> + mrs \dst, tpidr_el2
> +#else
> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> + mrs \dst, tpidr_el1
> +alternative_else
> + mrs \dst, tpidr_el2
> +alternative_endif
> +#endif

Cosmetic, but I think it would be cleaner just to define two variants of the
macro here:

#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
.macro  this_cpu_offset, dst
mrs \dst, tpidr_el2
.endm
#else
.macro  this_cpu_offset, dst
alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
mrs \dst, tpidr_el1
alternative_else
mrs \dst, tpidr_el2
alternative_endif
.endm
#endif

(and should we have a shorthand __HYPERVISOR define to avoid the NVHE || VHE
logic?)

With that:

Acked-by: Will Deacon 

Will

Re: [PATCH v3 03/11] kvm: arm64: Only define __kvm_ex_table for CONFIG_KVM

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:31PM +0100, David Brazdil wrote:
> Minor cleanup that only creates __kvm_ex_table ELF section and
> related symbols if CONFIG_KVM is enabled.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kernel/vmlinux.lds.S | 4 
>  1 file changed, 4 insertions(+)

It's also useful because we're about to add another entry here, so:

Acked-by: Will Deacon 

Will

Re: [PATCH v3 04/11] kvm: arm64: Remove __hyp_this_cpu_read

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:32PM +0100, David Brazdil wrote:
> this_cpu_ptr is meant for use in kernel proper because it selects between
> TPIDR_EL1/2 based on nVHE/VHE. __hyp_this_cpu_ptr was used in hyp to always
> select TPIDR_EL2. Unify all users behind this_cpu_ptr and friends by
> selecting _EL2 register under __KVM_VHE/NVHE_HYPERVISOR__.
> 
> Under CONFIG_DEBUG_PREEMPT, the kernel helpers perform a preemption check
> which is omitted by the hyp helpers. Preserve the behavior for nVHE by
> overriding the corresponding macros under __KVM_NVHE_HYPERVISOR__. Extend
> the checks into VHE hyp code.
> 
> Acked-by: Andrew Scull 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_asm.h  | 20 --
>  arch/arm64/include/asm/percpu.h   | 33 +--
>  arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 +--
>  arch/arm64/kvm/hyp/include/hyp/switch.h   |  8 +++---
>  arch/arm64/kvm/hyp/nvhe/switch.c  |  2 +-
>  arch/arm64/kvm/hyp/vhe/switch.c   |  2 +-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c|  4 +--
>  7 files changed, 41 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index c085032e2e3e..c196eec25498 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -143,26 +143,6 @@ extern char 
> __smccc_workaround_1_smc[__SMCCC_WORKAROUND_1_SMC_SZ];
>   addr;   \
>   })
>  
> -/*
> - * Home-grown __this_cpu_{ptr,read} variants that always work at HYP,
> - * provided that sym is really a *symbol* and not a pointer obtained from
> - * a data structure. As for SHIFT_PERCPU_PTR(), the creative casting keeps
> - * sparse quiet.
> - */
> -#define __hyp_this_cpu_ptr(sym)  
> \
> - ({  \
> - void *__ptr;\
> - __verify_pcpu_ptr();\
> - __ptr = hyp_symbol_addr(sym);   \
> - __ptr += read_sysreg(tpidr_el2);\
> - (typeof(sym) __kernel __force *)__ptr;  \
> -  })
> -
> -#define __hyp_this_cpu_read(sym) \
> - ({  \
> - *__hyp_this_cpu_ptr(sym);   \
> -  })
> -
>  #define __KVM_EXTABLE(from, to)  
> \
>   "   .pushsection__kvm_ex_table, \"a\"\n"\
>   "   .align  3\n"\
> diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> index 0b6409b89e5e..36f304401c38 100644
> --- a/arch/arm64/include/asm/percpu.h
> +++ b/arch/arm64/include/asm/percpu.h
> @@ -19,7 +19,21 @@ static inline void set_my_cpu_offset(unsigned long off)
>   :: "r" (off) : "memory");
>  }
>  
> -static inline unsigned long __my_cpu_offset(void)
> +static inline unsigned long __hyp_my_cpu_offset(void)
> +{
> + unsigned long off;
> +
> + /*
> +  * We want to allow caching the value, so avoid using volatile and
> +  * instead use a fake stack read to hazard against barrier().
> +  */

I don't think we need to copy/paste the comment...

> + asm("mrs %0, tpidr_el2" : "=r" (off) :
> + "Q" (*(const unsigned long *)current_stack_pointer));

... especially given that we're not preemptible at EL2 with nVHE, maybe
we don't need to play this trick at all because we're always going to be
on the same CPU. So we could actually just do:

return read_sysreg(tpidr_el2);

which is much better, and the comment should say something to that effect.

> +
> + return off;
> +}
> +
> +static inline unsigned long __kern_my_cpu_offset(void)
>  {
>   unsigned long off;
>  
> @@ -35,7 +49,12 @@ static inline unsigned long __my_cpu_offset(void)
>  
>   return off;
>  }
> -#define __my_cpu_offset __my_cpu_offset()
> +
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +#define __my_cpu_offset __hyp_my_cpu_offset()

Why would VHE code need to use this? Especially in light of my preemption
comments above, shouldn't it now be using __kern_my_cpu_offset()?

Will

Re: [PATCH v3 02/11] kvm: arm64: Move nVHE hyp namespace macros to hyp_image.h

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:30PM +0100, David Brazdil wrote:
> Minor cleanup to move all macros related to prefixing nVHE hyp section
> and symbol names into one place: hyp_image.h.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/hyp_image.h | 12 
>  arch/arm64/include/asm/kvm_asm.h   |  8 +---
>  arch/arm64/kernel/image-vars.h |  2 --
>  3 files changed, 13 insertions(+), 9 deletions(-)

Acked-by: Will Deacon 

Will

Re: [PATCH v3 01/11] kvm: arm64: Partially link nVHE hyp code, simplify HYPCOPY

2020-09-18 Thread Will Deacon

On Wed, Sep 16, 2020 at 06:34:29PM +0100, David Brazdil wrote:
> Relying on objcopy to prefix the ELF section names of the nVHE hyp code
> is brittle and prevents us from using wildcards to match specific
> section names.
> 
> Improve the build rules by partially linking all '.nvhe.o' files and
> prefixing their ELF section names using a linker script. Continue using
> objcopy for prefixing ELF symbol names.
> 
> One immediate advantage of this approach is that all subsections
> matching a pattern can be merged into a single prefixed section, eg.
> .text and .text.* can be linked into a single '.hyp.text'. This removes
> the need for -fno-reorder-functions on GCC and will be useful in the
> future too: LTO builds use .text subsections, compilers routinely
> generate .rodata subsections, etc.
> 
> Partially linking all hyp code into a single object file also makes it
> easier to analyze.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/hyp_image.h | 24 
>  arch/arm64/kvm/hyp/nvhe/Makefile   | 60 --
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S  | 13 +++
>  3 files changed, 70 insertions(+), 27 deletions(-)
>  create mode 100644 arch/arm64/include/asm/hyp_image.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> 
> diff --git a/arch/arm64/include/asm/hyp_image.h 
> b/arch/arm64/include/asm/hyp_image.h
> new file mode 100644
> index ..5b1e3b9ef376
> --- /dev/null
> +++ b/arch/arm64/include/asm/hyp_image.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 Google LLC.
> + * Written by David Brazdil 
> + */
> +
> +#ifndef __ARM64_HYP_IMAGE_H__
> +#define __ARM64_HYP_IMAGE_H__
> +
> +#ifdef LINKER_SCRIPT
> +
> +/*
> + * KVM nVHE ELF section names are prefixed with .hyp, to separate them
> + * from the kernel proper.
> + */
> +#define HYP_SECTION_NAME(NAME)   .hyp##NAME
> +
> +/* Defines an ELF hyp section from input section @NAME and its subsections. 
> */
> +#define HYP_SECTION(NAME) \
> + HYP_SECTION_NAME(NAME) : { *(NAME NAME##.[0-9a-zA-Z_]*) }

I still don't get why we can't just use NAME ## .* for the regex here. That
matches what we do elsewhere for linker script wildcarding, e.g. .rodata.*,
.init.text.* ...  in asm-generic/vmlinux.lds.h. Why is it different for
these sections?

> +
> +#endif /* LINKER_SCRIPT */
> +
> +#endif /* __ARM64_HYP_IMAGE_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile 
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index aef76487edc2..2b27b60182f9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -10,40 +10,46 @@ obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o 
> hyp-init.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>../fpsimd.o ../hyp-entry.o
>  
> -obj-y := $(patsubst %.o,%.hyp.o,$(obj-y))
> -extra-y := $(patsubst %.hyp.o,%.hyp.tmp.o,$(obj-y))
> +##
> +## Build rules for compiling nVHE hyp code
> +## Output of this folder is `kvm_nvhe.o`, a partially linked object
> +## file containing all nVHE hyp code and data.
> +##
>  
> -$(obj)/%.hyp.tmp.o: $(src)/%.c FORCE
> +hyp-obj := $(patsubst %.o,%.nvhe.o,$(obj-y))
> +obj-y := kvm_nvhe.o
> +extra-y := $(hyp-obj) kvm_nvhe.tmp.o hyp.lds
> +
> +# 1) Compile all source files to `.nvhe.o` object files. The file extension
> +#avoids file name clashes for files shared with VHE.
> +$(obj)/%.nvhe.o: $(src)/%.c FORCE
>   $(call if_changed_rule,cc_o_c)
> -$(obj)/%.hyp.tmp.o: $(src)/%.S FORCE
> +$(obj)/%.nvhe.o: $(src)/%.S FORCE
>   $(call if_changed_rule,as_o_S)
> -$(obj)/%.hyp.o: $(obj)/%.hyp.tmp.o FORCE
> - $(call if_changed,hypcopy)
>  
> -# Disable reordering functions by GCC (enabled at -O2).
> -# This pass puts functions into '.text.*' sections to aid the linker
> -# in optimizing ELF layout. See HYPCOPY comment below for more info.
> -ccflags-y += $(call cc-option,-fno-reorder-functions)
> +# 2) Compile linker script.
> +$(obj)/hyp.lds: $(src)/hyp.lds.S FORCE
> + $(call if_changed_dep,cpp_lds_S)

You need a .gitignore file listing hyp.lds, otherwise some idiot will end
up committing it. I definitely didn't do that when playing around with this
series. Nope. Not at all.

With that, and the regex resolved:

Acked-by: Will Deacon 

Will

Re: [PATCH v3] arm64: bpf: Fix branch offset in JIT

2020-09-17 Thread Will Deacon

On Thu, Sep 17, 2020 at 11:49:25AM +0300, Ilias Apalodimas wrote:
> Running the eBPF test_verifier leads to random errors looking like this:
> 
> [ 6525.735488] Unexpected kernel BRK exception at EL1
> [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP
> [ 6525.741609] Modules linked in: nls_utf8 cifs libdes libarc4 dns_resolver 
> fscache binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd 
> cryptd aes_ce_cipher ghash_ce gf128mul efi_pstore sha2_ce sha256_arm64 
> sha1_ce evdev efivars efivarfs ip_tables x_tables autofs4 btrfs 
> blake2b_generic xor xor_neon zstd_compress raid6_pq libcrc32c crc32c_generic 
> ahci xhci_pci libahci xhci_hcd igb libata i2c_algo_bit nvme realtek usbcore 
> nvme_core scsi_mod t10_pi netsec mdio_devres of_mdio gpio_keys fixed_phy 
> libphy gpio_mb86s7x
> [ 6525.787760] CPU: 3 PID: 7881 Comm: test_verifier Tainted: GW   
>   5.9.0-rc1+ #47
> [ 6525.796111] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS 
> build #1 Jun  6 2020
> [ 6525.804812] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
> [ 6525.810390] pc : bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
> [ 6525.815613] lr : bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
> [ 6525.820832] sp : 8000130cbb80
> [ 6525.824141] x29: 8000130cbbb0 x28: 
> [ 6525.829451] x27: 05ef6fcbf39b x26: 
> [ 6525.834759] x25: 8000130cbb80 x24: 800011dc7038
> [ 6525.840067] x23: 8000130cbd00 x22: 0008f624d080
> [ 6525.845375] x21: 0001 x20: 800011dc7000
> [ 6525.850682] x19:  x18: 
> [ 6525.855990] x17:  x16: 
> [ 6525.861298] x15:  x14: 
> [ 6525.866606] x13:  x12: 
> [ 6525.871913] x11: 0001 x10: 800a660c
> [ 6525.877220] x9 : 800010951810 x8 : 8000130cbc38
> [ 6525.882528] x7 :  x6 : 009864cfa881
> [ 6525.887836] x5 : 00ff x4 : 002880ba1a0b3e9f
> [ 6525.893144] x3 : 0018 x2 : 800a4374
> [ 6525.898452] x1 : 000a x0 : 0009
> [ 6525.903760] Call trace:
> [ 6525.906202]  bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
> [ 6525.911076]  bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
> [ 6525.915957]  bpf_dispatcher_xdp_func+0x14/0x20
> [ 6525.920398]  bpf_test_run+0x70/0x1b0
> [ 6525.923969]  bpf_prog_test_run_xdp+0xec/0x190
> [ 6525.928326]  __do_sys_bpf+0xc88/0x1b28
> [ 6525.932072]  __arm64_sys_bpf+0x24/0x30
> [ 6525.935820]  el0_svc_common.constprop.0+0x70/0x168
> [ 6525.940607]  do_el0_svc+0x28/0x88
> [ 6525.943920]  el0_sync_handler+0x88/0x190
> [ 6525.947838]  el0_sync+0x140/0x180
> [ 6525.951154] Code: d4202000 d4202000 d4202000 d4202000 (d4202000)
> [ 6525.957249] ---[ end trace cecc3f93b14927e2 ]---
> 
> The reason is the offset[] creation and later usage, while building
> the eBPF body. The code currently omits the first instruction, since
> build_insn() will increase our ctx->idx before saving it.
> That was fine up until bounded eBPF loops were introduced. After that
> introduction, offset[0] must be the offset of the end of prologue which
> is the start of the 1st insn while, offset[n] holds the
> offset of the end of n-th insn.
> 
> When "taken loop with back jump to 1st insn" test runs, it will
> eventually call bpf2a64_offset(-1, 2, ctx). Since negative indexing is
> permitted, the current outcome depends on the value stored in
> ctx->offset[-1], which has nothing to do with our array.
> If the value happens to be 0 the tests will work. If not this error
> triggers.
> 
> commit 7c2e988f400e ("bpf: fix x64 JIT code generation for jmp to 1st insn")
> fixed an indentical bug on x86 when eBPF bounded loops were introduced.
> 
> So let's fix it by creating the ctx->offset[] differently. Track the
> beginning of instruction and account for the extra instruction while
> calculating the arm instruction offsets.
> 
> Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
> Reported-by: Naresh Kamboju 
> Reported-by: Jiri Olsa 
> Co-developed-by: Jean-Philippe Brucker 
> Signed-off-by: Jean-Philippe Brucker 
> Co-developed-by: Yauheni Kaliuta 
> Signed-off-by: Yauheni Kaliuta 
> Signed-off-by: Ilias Apalodimas 

Acked-by: Will Deacon 

Catalin -- do you want to take this as a fix?

Will

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Will Deacon

On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote:
> > Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
> > From: Hou Tao 
> > Date: Tue, 15 Sep 2020 22:07:50 +0800
> > 
> > From: Hou Tao 
> > 
> > The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
> > that percpu-rwsem is a blocking primitive, should be just fine.
> > 
> > However, file_end_write() is used from IRQ context and will cause
> > load-store issues.
> > 
> > Fixing it by using the IRQ-safe this_cpu_*() for operations on
> > read_count. This will generate more expensive code on a number of
> > platforms, which might cause a performance regression for some of the
> > other percpu-rwsem users.
> > 
> > If any such is reported, we can consider alternative solutions.
> > 
> I have simply test the performance impact on both x86 and aarch64.
> 
> There is no degradation under x86 (2 sockets, 18 core per sockets, 2 threads 
> per core)
> 
> v5.8.9
> no writer, reader cn   | 18| 36| 
> 72
> the rate of down_read/up_read per second   | 231423957 | 230737381 | 
> 109943028
> the rate of down_read/up_read per second (patched) | 232864799 | 233555210 | 
> 109768011
> 
> However the performance degradation is huge under aarch64 (4 sockets, 24 core 
> per sockets): nearly 60% lost.
> 
> v4.19.111
> no writer, reader cn   | 24| 48| 
> 72| 96
> the rate of down_read/up_read per second   | 166129572 | 166064100 | 
> 165963448 | 165203565
> the rate of down_read/up_read per second (patched) |  63863506 |  63842132 |  
> 63757267 |  63514920
> 
> I will test the aarch64 host by using v5.8 tomorrow.

Thanks. We did improve the preempt_count() munging a bit since 4.19 (I
think), so maybe 5.8 will be a bit better. Please report back!

Will

Re: [PATCH v2 00/10] Independent per-CPU data section for nVHE

2020-09-16 Thread Will Deacon

On Wed, Sep 16, 2020 at 01:24:12PM +0100, David Brazdil wrote:
> > I was also wondering about another approach - using the PERCPU_SECTION macro
> > unchanged in the hyp linker script. It would lay out a single .data..percpu 
> > and
> > we would then prefix it with .hyp and the symbols with __kvm_nvhe_ as with
> > everything else. WDYT? Haven't tried that yet, could be a naive idea. 
> 
> Seems to work. Can't use PERCPU_SECTION directly because then we couldn't
> rename it in the same linker script, but if we just unwrap that one layer
> we can use PERCPU_INPUT. No global macro changes needed.
> 
> Let me know what you think.
> 
> --8<--
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 5904a4de9f40..9e6bf21268f1 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -195,11 +195,9 @@ SECTIONS
> PERCPU_SECTION(L1_CACHE_BYTES)
> 
> /* KVM nVHE per-cpu section */
> -   #undef PERCPU_SECTION_NAME
> -   #undef PERCPU_SYMBOL_NAME
> -   #define PERCPU_SECTION_NAME(suffix) CONCAT3(.hyp, 
> PERCPU_SECTION_BASE_NAME, suffix)
> -   #define PERCPU_SYMBOL_NAME(name)__kvm_nvhe_ ## name
> -   PERCPU_SECTION(L1_CACHE_BYTES)
> +   . = ALIGN(PAGE_SIZE);
> +   .hyp.data..percpu : { *(.hyp.data..percpu) }
> +   . = ALIGN(PAGE_SIZE);
> 
> .rela.dyn : ALIGN(8) {
> *(.rela .rela*)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S 
> b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> index 7d8c3fa004f4..1d8e4f7edc29 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> @@ -4,6 +4,10 @@
>   * Written by David Brazdil 
>   */
> 
> +#include 
> +#include 
> +#include 
> +
>  /*
>   * Defines an ELF hyp section from input section @NAME and its subsections.
>   */
> @@ -11,9 +15,9 @@
> 
>  SECTIONS {
> HYP_SECTION(.text)
> -   HYP_SECTION(.data..percpu)
> -   HYP_SECTION(.data..percpu..first)
> -   HYP_SECTION(.data..percpu..page_aligned)
> -   HYP_SECTION(.data..percpu..read_mostly)
> -   HYP_SECTION(.data..percpu..shared_aligned)
> +
> +   .hyp..data..percpu : {

Too many '.'s here?

> +   __per_cpu_load = .;

I don't think we need this symbol.

Otherwise, idea looks good to me. Can you respin like this, but also
incorporating some of the cleanup in the diff I posted, please?

Will

Re: [PATCH] doc: zh_CN: index files in arm64 subdirectory

2020-09-16 Thread Will Deacon

On Tue, Sep 15, 2020 at 09:57:01PM -0700, Bailu Lin wrote:
> Add filesystems subdirectory into the table of Contents for zh_CN,
> all translations residing on it would be indexed conveniently.

Sorry, I don't understand the reference to "filesystems" here. Is this
a copy-paste of a different commit message, or am I just failing to read
it properly?

Will

Re: [PATCH v2] KVM: arm64: fix doc warnings in mmu code

2020-09-16 Thread Will Deacon

On Wed, Sep 16, 2020 at 10:00:39AM +0800, Xiaofei Tan wrote:
> Fix following warnings caused by mismatch bewteen function parameters
> and comments.
> arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'mmu' not 
> described in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'may_block' 
> not described in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:128: warning: Excess function parameter 'kvm' 
> description in '__unmap_stage2_range'
> arch/arm64/kvm/mmu.c:499: warning: Function parameter or member 'writable' 
> not described in 'kvm_phys_addr_ioremap'
> arch/arm64/kvm/mmu.c:538: warning: Function parameter or member 'mmu' not 
> described in 'stage2_wp_range'
> arch/arm64/kvm/mmu.c:538: warning: Excess function parameter 'kvm' 
> description in 'stage2_wp_range'
> 
> Signed-off-by: Xiaofei Tan 
> ---
>  arch/arm64/kvm/mmu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index e8a51799..909e995 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -114,9 +114,10 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>   */
>  /**
>   * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> - * @kvm:   The VM pointer
> + * @mmu:   pointer to mmu structure to operate on
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> + * @may_block: The flag that if block is allowed here

Whether or not we are permitted to block.

>   *
>   * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
>   * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> @@ -493,6 +494,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   * @guest_ipa:   The IPA at which to insert the mapping
>   * @pa:  The physical address of the device
>   * @size:The size of the mapping
> + * @writable:   If it is writable here

Whether or not to create a writable mapping.

>   */
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> phys_addr_t pa, unsigned long size, bool writable)
> @@ -530,7 +532,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
> guest_ipa,
>  
>  /**
>   * stage2_wp_range() - write protect stage2 memory region range
> - * @kvm: The KVM pointer
> + * @mmu:pointer to mmu structure to operate on

The KVM stage-2 MMU pointer.

Will

Re: [PATCH 2/2] arm64/mm: Enable color zero pages

2020-09-16 Thread Will Deacon

On Wed, Sep 16, 2020 at 01:25:23PM +1000, Gavin Shan wrote:
> This enables color zero pages by allocating contigous page frames
> for it. The number of pages for this is determined by L1 dCache
> (or iCache) size, which is probbed from the hardware.
> 
>* Add cache_total_size() to return L1 dCache (or iCache) size
> 
>* Implement setup_zero_pages(), which is called after the page
>  allocator begins to work, to allocate the contigous pages
>  needed by color zero page.
> 
>* Reworked ZERO_PAGE() and define __HAVE_COLOR_ZERO_PAGE.
> 
> Signed-off-by: Gavin Shan 
> ---
>  arch/arm64/include/asm/cache.h   | 22 
>  arch/arm64/include/asm/pgtable.h |  9 ++--
>  arch/arm64/kernel/cacheinfo.c| 34 +++
>  arch/arm64/mm/init.c | 35 
>  arch/arm64/mm/mmu.c  |  7 ---
>  5 files changed, 98 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index a4d1b5f771f6..420e9dde2c51 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -39,6 +39,27 @@
>  #define CLIDR_LOC(clidr) (((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
>  #define CLIDR_LOUIS(clidr)   (((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
>  
> +#define CSSELR_TND_SHIFT 4
> +#define CSSELR_TND_MASK  (UL(1) << CSSELR_TND_SHIFT)
> +#define CSSELR_LEVEL_SHIFT   1
> +#define CSSELR_LEVEL_MASK(UL(7) << CSSELR_LEVEL_SHIFT)
> +#define CSSELR_IND_SHIFT 0
> +#define CSSERL_IND_MASK  (UL(1) << CSSELR_IND_SHIFT)
> +
> +#define CCSIDR_64_LS_SHIFT   0
> +#define CCSIDR_64_LS_MASK(UL(7) << CCSIDR_64_LS_SHIFT)
> +#define CCSIDR_64_ASSOC_SHIFT3
> +#define CCSIDR_64_ASSOC_MASK (UL(0x1F) << CCSIDR_64_ASSOC_SHIFT)
> +#define CCSIDR_64_SET_SHIFT  32
> +#define CCSIDR_64_SET_MASK   (UL(0xFF) << CCSIDR_64_SET_SHIFT)
> +
> +#define CCSIDR_32_LS_SHIFT   0
> +#define CCSIDR_32_LS_MASK(UL(7) << CCSIDR_32_LS_SHIFT)
> +#define CCSIDR_32_ASSOC_SHIFT3
> +#define CCSIDR_32_ASSOC_MASK (UL(0x3FF) << CCSIDR_32_ASSOC_SHIFT)
> +#define CCSIDR_32_SET_SHIFT  13
> +#define CCSIDR_32_SET_MASK   (UL(0x7FFF) << CCSIDR_32_SET_SHIFT)

I don't think we should be inferring cache structure from these register
values. The Arm ARM helpfully says:

  | You cannot make any inference about the actual sizes of caches based
  | on these parameters.

so we need to take the topology information from elsewhere.

But before we get into that, can you justify why we need to do this at all,
please? Do you have data to show the benefit of adding this complexity?

Cheers,

Will

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-16 Thread Will Deacon

On Tue, Sep 15, 2020 at 08:11:12PM +0200, pet...@infradead.org wrote:
> On Tue, Sep 15, 2020 at 05:11:23PM +0100, Will Deacon wrote:
> > On Tue, Sep 15, 2020 at 06:03:44PM +0200, pet...@infradead.org wrote:
> > > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote:
> > > 
> > > > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent.
> > > 
> > > How's this?
> > > 
> > > ---
> > > Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
> > > From: Hou Tao 
> > > Date: Tue, 15 Sep 2020 22:07:50 +0800
> > > 
> > > From: Hou Tao 
> > > 
> > > The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
> > > that percpu-rwsem is a blocking primitive, should be just fine.
> > > 
> > > However, file_end_write() is used from IRQ context and will cause
> > > load-store issues.
> > 
> > ... on architectures where the per-cpu accessors are not atomic.
> 
> That's not entirely accurate, on x86 for example the per-cpu ops are not
> atomic, but they are not susceptible to this problem due to them being a
> single instruction from the point of interrupts -- either they wholly
> happen or they don't.

Hey, the implication is still correct though ;)

> So I'd reformulate it like: "... on architectures where the per-cpu
> accessors are not natively irq-safe" ?

But yeah, that's better. Thanks.

Will

Re: [PATCH v2] arm64: bpf: Fix branch offset in JIT

2020-09-15 Thread Will Deacon

On Tue, Sep 15, 2020 at 04:53:44PM +0300, Ilias Apalodimas wrote:
> On Tue, Sep 15, 2020 at 02:11:03PM +0100, Will Deacon wrote:
> > Hi Ilias,
> > 
> > On Mon, Sep 14, 2020 at 07:03:55PM +0300, Ilias Apalodimas wrote:
> > > Running the eBPF test_verifier leads to random errors looking like this:
> > > 
> > > [ 6525.735488] Unexpected kernel BRK exception at EL1
> > > [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP
> > 
> > Does this happen because we poison the BPF memory with BRK instructions?
> > Maybe we should look at using a special immediate so we can detect this,
> > rather than end up in the ptrace handler.
> 
> As discussed offline this is what aarch64_insn_gen_branch_imm() will return 
> for
> offsets > 128M and yes replacing the handler with a more suitable message 
> would 
> be good.

Can you give the diff below a shot, please? Hopefully printing a more useful
message will mean these things get triaged/debugged better in future.

Will

--->8

diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
index 840a35ed92ec..b15eb4a3e6b2 100644
--- a/arch/arm64/include/asm/extable.h
+++ b/arch/arm64/include/asm/extable.h
@@ -22,6 +22,15 @@ struct exception_table_entry
 
 #define ARCH_HAS_RELATIVE_EXTABLE
 
+static inline bool in_bpf_jit(struct pt_regs *regs)
+{
+   if (!IS_ENABLED(CONFIG_BPF_JIT))
+   return false;
+
+   return regs->pc >= BPF_JIT_REGION_START &&
+  regs->pc < BPF_JIT_REGION_END;
+}
+
 #ifdef CONFIG_BPF_JIT
 int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
  struct pt_regs *regs);
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 7310a4f7f993..fa76151de6ff 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -384,7 +384,7 @@ void __init debug_traps_init(void)
hook_debug_fault_code(DBG_ESR_EVT_HWSS, single_step_handler, SIGTRAP,
  TRAP_TRACE, "single-step handler");
hook_debug_fault_code(DBG_ESR_EVT_BRK, brk_handler, SIGTRAP,
- TRAP_BRKPT, "ptrace BRK handler");
+ TRAP_BRKPT, "BRK handler");
 }
 
 /* Re-enable single step for syscall restarting. */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 13ebd5ca2070..9f7fde99eda2 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -994,6 +995,21 @@ static struct break_hook bug_break_hook = {
.imm = BUG_BRK_IMM,
 };
 
+static int reserved_fault_handler(struct pt_regs *regs, unsigned int esr)
+{
+   pr_err("%s generated an invalid instruction at %pS!\n",
+   in_bpf_jit(regs) ? "BPF JIT" : "Kernel runtime patching",
+   instruction_pointer(regs));
+
+   /* We cannot handle this */
+   return DBG_HOOK_ERROR;
+}
+
+static struct break_hook fault_break_hook = {
+   .fn = reserved_fault_handler,
+   .imm = FAULT_BRK_IMM,
+};
+
 #ifdef CONFIG_KASAN_SW_TAGS
 
 #define KASAN_ESR_RECOVER  0x20
@@ -1059,6 +1075,7 @@ int __init early_brk64(unsigned long addr, unsigned int 
esr,
 void __init trap_init(void)
 {
register_kernel_break_hook(_break_hook);
+   register_kernel_break_hook(_break_hook);
 #ifdef CONFIG_KASAN_SW_TAGS
register_kernel_break_hook(_break_hook);
 #endif
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index eee1732ab6cd..aa0060178343 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -14,9 +14,7 @@ int fixup_exception(struct pt_regs *regs)
if (!fixup)
return 0;
 
-   if (IS_ENABLED(CONFIG_BPF_JIT) &&
-   regs->pc >= BPF_JIT_REGION_START &&
-   regs->pc < BPF_JIT_REGION_END)
+   if (in_bpf_jit(regs))
return arm64_bpf_fixup_exception(fixup, regs);
 
regs->pc = (unsigned long)>fixup + fixup->fixup;

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread Will Deacon

On Tue, Sep 15, 2020 at 06:03:44PM +0200, pet...@infradead.org wrote:
> On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote:
> 
> > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent.
> 
> How's this?
> 
> ---
> Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
> From: Hou Tao 
> Date: Tue, 15 Sep 2020 22:07:50 +0800
> 
> From: Hou Tao 
> 
> The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
> that percpu-rwsem is a blocking primitive, should be just fine.
> 
> However, file_end_write() is used from IRQ context and will cause
> load-store issues.

... on architectures where the per-cpu accessors are not atomic.

> 
> Fixing it by using the IRQ-safe this_cpu_*() for operations on

Fixing => Fix ?

> read_count. This will generate more expensive code on a number of
> platforms, which might cause a performance regression for some of the
> other percpu-rwsem users.
> 
> If any such is reported, we can consider alternative solutions.
> 
> Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
> Signed-off-by: Hou Tao 
> Signed-off-by: Peter Zijlstra (Intel) 
> Link: https://lkml.kernel.org/r/20200915140750.137881-1-hout...@huawei.com
> ---
>  include/linux/percpu-rwsem.h  |8 
>  kernel/locking/percpu-rwsem.c |4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

For the patch:

Acked-by: Will Deacon 

Will

Re: [patch 04/13] lockdep: Clenaup PREEMPT_COUNT leftovers

2020-09-15 Thread Will Deacon

On Mon, Sep 14, 2020 at 10:42:13PM +0200, Thomas Gleixner wrote:
> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> removed. Cleanup the leftovers before doing so.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> ---
>  include/linux/lockdep.h |6 ++
>  lib/Kconfig.debug   |1 -
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -585,16 +585,14 @@ do {
> \
>  
>  #define lockdep_assert_preemption_enabled()  \
>  do { \
> - WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)   &&  \
> -  debug_locks&&  \
> + WARN_ON_ONCE(debug_locks&&  \
>(preempt_count() != 0  ||  \
> !raw_cpu_read(hardirqs_enabled)));\
>  } while (0)
>  
>  #define lockdep_assert_preemption_disabled() \
>  do { \
> - WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)   &&  \
> -  debug_locks&&  \
> + WARN_ON_ONCE(debug_locks&&  \
>(preempt_count() == 0  &&  \
> raw_cpu_read(hardirqs_enabled))); \
>  } while (0)
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1161,7 +1161,6 @@ config PROVE_LOCKING
>   select DEBUG_RWSEMS
>   select DEBUG_WW_MUTEX_SLOWPATH
>   select DEBUG_LOCK_ALLOC
> - select PREEMPT_COUNT
>   select TRACE_IRQFLAGS
>   default n
>   help

Acked-by: Will Deacon 

Will

Re: [PATCH v6] perf: arm_dsu: Support DSU ACPI devices

2020-09-15 Thread Will Deacon

On Mon, 14 Sep 2020 11:04:16 -0700, Tuan Phan wrote:
> Add support for probing device from ACPI node.
> Each DSU ACPI node and its associated cpus are inside a cluster node.

Applied to will (for-next/perf), thanks!

[1/1] perf: arm_dsu: Support DSU ACPI devices
  https://git.kernel.org/will/c/2b694fc92a34

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [patch 06/13] locking/bitspinlock: Clenaup PREEMPT_COUNT leftovers

2020-09-15 Thread Will Deacon

On Mon, Sep 14, 2020 at 10:42:15PM +0200, Thomas Gleixner wrote:
> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> removed. Cleanup the leftovers before doing so.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/bit_spinlock.h |4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> --- a/include/linux/bit_spinlock.h
> +++ b/include/linux/bit_spinlock.h
> @@ -90,10 +90,8 @@ static inline int bit_spin_is_locked(int
>  {
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
>   return test_bit(bitnum, addr);
> -#elif defined CONFIG_PREEMPT_COUNT
> - return preempt_count();
>  #else
> - return 1;
> + return preempt_count();
>  #endif

Acked-by: Will Deacon 

Will

Re: [PATCH v2] arm64: bpf: Fix branch offset in JIT

2020-09-15 Thread Will Deacon

Hi Ilias,

On Mon, Sep 14, 2020 at 07:03:55PM +0300, Ilias Apalodimas wrote:
> Running the eBPF test_verifier leads to random errors looking like this:
> 
> [ 6525.735488] Unexpected kernel BRK exception at EL1
> [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP

Does this happen because we poison the BPF memory with BRK instructions?
Maybe we should look at using a special immediate so we can detect this,
rather than end up in the ptrace handler.

> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index f8912e45be7a..0974e58c 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -143,9 +143,13 @@ static inline void emit_addr_mov_i64(const int reg, 
> const u64 val,
>   }
>  }
>  
> -static inline int bpf2a64_offset(int bpf_to, int bpf_from,
> +static inline int bpf2a64_offset(int bpf_insn, int off,
>const struct jit_ctx *ctx)
>  {
> + /* arm64 offset is relative to the branch instruction */
> + int bpf_from = bpf_insn + 1;
> + /* BPF JMP offset is relative to the next instruction */
> + int bpf_to = bpf_insn + off + 1;
>   int to = ctx->offset[bpf_to];
>   /* -1 to account for the Branch instruction */
>   int from = ctx->offset[bpf_from] - 1;

I think this is a bit confusing with all the variables. How about just
doing:

/* BPF JMP offset is relative to the next BPF instruction */
bpf_insn++;

/*
 * Whereas arm64 branch instructions encode the offset from the
 * branch itself, so we must subtract 1 from the instruction offset.
 */
return ctx->offset[bpf_insn + off] - ctx->offset[bpf_insn] - 1;

> @@ -642,7 +646,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
> jit_ctx *ctx,
>  
>   /* JUMP off */
>   case BPF_JMP | BPF_JA:
> - jmp_offset = bpf2a64_offset(i + off, i, ctx);
> + jmp_offset = bpf2a64_offset(i, off, ctx);
>   check_imm26(jmp_offset);
>   emit(A64_B(jmp_offset), ctx);
>   break;
> @@ -669,7 +673,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
> jit_ctx *ctx,
>   case BPF_JMP32 | BPF_JSLE | BPF_X:
>   emit(A64_CMP(is64, dst, src), ctx);
>  emit_cond_jmp:
> - jmp_offset = bpf2a64_offset(i + off, i, ctx);
> + jmp_offset = bpf2a64_offset(i, off, ctx);
>   check_imm19(jmp_offset);
>   switch (BPF_OP(code)) {
>   case BPF_JEQ:
> @@ -912,18 +916,26 @@ static int build_body(struct jit_ctx *ctx, bool 
> extra_pass)
>   const struct bpf_insn *insn = >insnsi[i];
>   int ret;
>  
> + /*
> +  * offset[0] offset of the end of prologue, start of the
> +  * first insn.
> +  * offset[x] - offset of the end of x insn.

So does offset[1] point at the last arm64 instruction for the first BPF
instruction, or does it point to the first arm64 instruction for the second
BPF instruction?

> +  */
> + if (ctx->image == NULL)
> + ctx->offset[i] = ctx->idx;
> +
>   ret = build_insn(insn, ctx, extra_pass);
>   if (ret > 0) {
>   i++;
>   if (ctx->image == NULL)
> - ctx->offset[i] = ctx->idx;
> + ctx->offset[i] = ctx->offset[i - 1];

Does it matter that we set the offset for both halves of a 16-byte BPF
instruction? I think that's a change in behaviour here.

>   continue;
>   }
> - if (ctx->image == NULL)
> - ctx->offset[i] = ctx->idx;
>   if (ret)
>   return ret;
>   }
> + if (ctx->image == NULL)
> + ctx->offset[i] = ctx->idx;

I think it would be cleared to set ctx->offset[0] before the for loop (with
a comment about what it is) and then change the for loop to iterate from 1
all the way to prog->len.

Will

Re: [PATCH v2 00/10] Independent per-CPU data section for nVHE

2020-09-14 Thread Will Deacon

Hi David,

On Thu, Sep 03, 2020 at 11:17:02AM +0200, David Brazdil wrote:
> Introduce '.hyp.data..percpu' as part of ongoing effort to make nVHE
> hyp code self-contained and independent of the rest of the kernel.
> 
> The series builds on top of the "Split off nVHE hyp code" series which
> used objcopy to rename '.text' to '.hyp.text' and prefix all ELF
> symbols with '__kvm_nvhe' for all object files under kvm/hyp/nvhe.

I've been playing around with this series this afternoon, trying to see
if we can reduce the coupling between the nVHE code and the core code. I've
ended up with the diff below on top of your series, but I think it actually
removes the need to change the core code at all. The idea is to collapse
the percpu sections during prelink, and then we can just deal with the
resulting data section a bit like we do for .hyp.text already.

Have I missed something critical?

Cheers,

Will

--->8

diff --git a/arch/arm64/include/asm/hyp_image.h 
b/arch/arm64/include/asm/hyp_image.h
new file mode 100644
index ..40bbf2ddb50f
--- /dev/null
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_HYP_IMAGE_H
+#define __ASM_HYP_IMAGE_H
+
+/*
+ * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_, to
+ * separate it from the kernel proper.
+ */
+#define kvm_nvhe_sym(sym)  __kvm_nvhe_##sym
+
+#ifdef LINKER_SCRIPT
+/*
+ * Defines an ELF hyp section from input section @NAME and its subsections.
+ */
+#define HYP_SECTION(NAME)  .hyp ## NAME : { *(NAME NAME ## .*) }
+#define KVM_NVHE_ALIAS(sym)kvm_nvhe_sym(sym) = sym;
+#endif /* LINKER_SCRIPT */
+
+#endif /* __ASM_HYP_IMAGE_H */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index c87111c25d9e..e0e1e404f6eb 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -7,6 +7,7 @@
 #ifndef __ARM_KVM_ASM_H__
 #define __ARM_KVM_ASM_H__
 
+#include 
 #include 
 
 #defineVCPU_WORKAROUND_2_FLAG_SHIFT0
@@ -42,13 +43,6 @@
 
 #include 
 
-/*
- * Translate name of a symbol defined in nVHE hyp to the name seen
- * by kernel proper. All nVHE symbols are prefixed by the build system
- * to avoid clashes with the VHE variants.
- */
-#define kvm_nvhe_sym(sym)  __kvm_nvhe_##sym
-
 #define DECLARE_KVM_VHE_SYM(sym)   extern char sym[]
 #define DECLARE_KVM_NVHE_SYM(sym)  extern char kvm_nvhe_sym(sym)[]
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 21307e2db3fc..f16205300dbc 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -54,15 +54,11 @@ __efistub__ctype= _ctype;
 #ifdef CONFIG_KVM
 
 /*
- * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_, to
- * separate it from the kernel proper. The following symbols are legally
- * accessed by it, therefore provide aliases to make them linkable.
- * Do not include symbols which may not be safely accessed under hypervisor
- * memory mappings.
+ * The following symbols are legally accessed by the KVM nVHE code, therefore
+ * provide aliases to make them linkable. Do not include symbols which may not
+ * be safely accessed under hypervisor memory mappings.
  */
 
-#define KVM_NVHE_ALIAS(sym) __kvm_nvhe_##sym = sym;
-
 /* Alternative callbacks for init-time patching of nVHE hyp code. */
 KVM_NVHE_ALIAS(arm64_enable_wa2_handling);
 KVM_NVHE_ALIAS(kvm_patch_vector_branch);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5904a4de9f40..c06e6860adfd 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -9,27 +9,37 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "image.h"
 
-#define __CONCAT3(x, y, z) x ## y ## z
-#define CONCAT3(x, y, z) __CONCAT3(x, y, z)
-
 OUTPUT_ARCH(aarch64)
 ENTRY(_text)
 
 jiffies = jiffies_64;
 
-
+#ifdef CONFIG_KVM
 #define HYPERVISOR_EXTABLE \
. = ALIGN(SZ_8);\
__start___kvm_ex_table = .; \
*(__kvm_ex_table)   \
__stop___kvm_ex_table = .;
 
+#define HYPERVISOR_PERCPU_SECTION  \
+   . = ALIGN(PAGE_SIZE);   \
+   .hyp.data..percpu : {   \
+   kvm_nvhe_sym(__per_cpu_start) = .;  \
+   *(.hyp.data..percpu)\
+   kvm_nvhe_sym(__per_cpu_end) = .;\
+   }
+#else
+#define HYPERVISOR_EXTABLE
+#define HYPERVISOR_PERCPU_SECTION
+#endif
+
 #define HYPERVISOR_TEXT\
/*  \
 * Align to 4 KB so that\
@@ -193,13 +203,7 @@ SECTIONS
}
 
PERCPU_SECTION(L1_CACHE_BYTES)
-
-   /* KVM nVHE per-cpu section

Re: [PATCH v2] arm64/mm: Refactor {pgd, pud, pmd, pte}_ERROR()

2020-09-14 Thread Will Deacon

On Mon, 14 Sep 2020 09:47:30 +1000, Gavin Shan wrote:
> The function __{pgd, pud, pmd, pte}_error() are introduced so that
> they can be called by {pgd, pud, pmd, pte}_ERROR(). However, some
> of the functions could never be called when the corresponding page
> table level isn't enabled. For example, __{pud, pmd}_error() are
> unused when PUD and PMD are folded to PGD.
> 
> This removes __{pgd, pud, pmd, pte}_error() and call pr_err() from
> {pgd, pud, pmd, pte}_ERROR() directly, similar to what x86/powerpc
> are doing. With this, the code looks a bit simplified either.

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64/mm: Refactor {pgd, pud, pmd, pte}_ERROR()
  https://git.kernel.org/arm64/c/2cf660eb81e9

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] arm64: bpf: Fix branch offset in JIT

2020-09-14 Thread Will Deacon

Hi Ilias,

On Mon, Sep 14, 2020 at 04:23:50PM +0300, Ilias Apalodimas wrote:
> On Mon, Sep 14, 2020 at 03:35:04PM +0300, Ilias Apalodimas wrote:
> > On Mon, Sep 14, 2020 at 01:20:43PM +0100, Will Deacon wrote:
> > > On Mon, Sep 14, 2020 at 11:36:21AM +0300, Ilias Apalodimas wrote:
> > > > Running the eBPF test_verifier leads to random errors looking like this:

[...]

> > > > The reason seems to be the offset[] creation and usage ctx->offset[]
> > > 
> > > "seems to be"? Are you unsure?
> > 
> > Reading the history and other ports of the JIT implementation, I couldn't 
> > tell if the decision on skipping the 1st entry was deliberate or not on 
> > Aarch64. Reading through the mailist list didn't help either [1].
> > Skipping the 1st entry seems indeed to cause the problem.
> > I did run the patch though the BPF tests and showed no regressions + fixing 
> > the error.
> 
> I'll correct myself here.
> Looking into 7c2e988f400e ("bpf: fix x64 JIT code generation for jmp to 1st 
> insn")
> explains things a bit better.
> Jumping back to the 1st insn wasn't allowed until eBPF bounded loops were 
> introduced. That's what the 1st instruction was not saved in the original 
> code.
> 
> > > 
> > > No Fixes: tag?
> > 
> > I'll re-spin and apply one 
> > 
> Any suggestion on any Fixes I should apply? The original code was 'correct' 
> and
> broke only when bounded loops and their self-tests were introduced.

Ouch, that's pretty bad as it means nobody is regression testing BPF on
arm64 with mainline. Damn.

The Fixes: tag should identify the commit beyond which we don't need to
backport the fix, so it sounds like introduction of bounded loops, according
to your analysis.

Will

Re: [PATCH v2] drivers/perf: xgene_pmu: Fix uninitialized resource struct

2020-09-14 Thread Will Deacon

On Mon, Sep 14, 2020 at 09:13:39AM -0400, Mark Salter wrote:
> On Mon, 2020-09-14 at 12:28 +0100, Will Deacon wrote:
> > On Sun, Sep 13, 2020 at 01:45:36PM -0400, Mark Salter wrote:
> > > @@ -1483,11 +1473,23 @@ xgene_pmu_dev_ctx *acpi_get_pmu_hw_inf(struct 
> > > xgene_pmu *xgene_pmu,
> > >   return NULL;
> > >  
> > >   INIT_LIST_HEAD(_list);
> > > - rc = acpi_dev_get_resources(adev, _list,
> > > - acpi_pmu_dev_add_resource, );
> > > + rc = acpi_dev_get_resources(adev, _list, NULL, NULL);
> > > + if (rc <= 0) {
> > > + dev_err(dev, "PMU type %d: No resources found\n", type);
> > > + return NULL;
> > > + }
> > > +
> > > + list_for_each_entry(rentry, _list, node) {
> > > + if (resource_type(rentry->res) == IORESOURCE_MEM) {
> > > + res = *rentry->res;
> > > + rentry = NULL;
> > > + break;
> > > + }
> > > + }
> > >   acpi_dev_free_resource_list(_list);
> > > - if (rc < 0) {
> > > - dev_err(dev, "PMU type %d: No resource address found\n", type);
> > > +
> > > + if (rentry) {
> > 
> > I'm curious as to why you've had to change the failure logic here, setting
> > rentry to NULL instead of checking 'rentry->res' like the TX2 driver (which
> > I don't immediately understand at first glance).
> > 
> > Will
> > 
> 
> I don't fully understand the tx2 logic. I do see there's a memory leak if
> that (!rentry->res) is true. I was going to dig deeper and follow up with
> a patch for tx2. I suspect that rentry->res should never be true. And tx2
> won't detect if no memory resource is in the table.

Ah good, so it wasn't just me then! In which case, please send the two
patches as a series fixing both drivers, and I'll queue them both together.

Ta,

Will

Re: [PATCH 1/4] kbuild: remove cc-option test of -fno-strict-overflow

2020-09-14 Thread Will Deacon

On Mon, Sep 14, 2020 at 09:51:31PM +0900, Masahiro Yamada wrote:
> On Sat, Sep 12, 2020 at 12:22 AM Will Deacon  wrote:
> > On Thu, Sep 10, 2020 at 10:51:17PM +0900, Masahiro Yamada wrote:
> > > The minimal compiler versions, GCC 4.9 and Clang 10 support this flag.
> > >
> > > Here is the godbolt:
> > > https://godbolt.org/z/odq8h9
> > >
> > > Signed-off-by: Masahiro Yamada 
> > > ---
> > >
> > >  Makefile  | 2 +-
> > >  arch/arm64/kernel/vdso32/Makefile | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > This, and the other patch (4/4 -- I didn't see 2 or 3), look good to me.
> > Are you taking them via the kbuild tree, or shall I queue them in the arm64
> > tree? Please just let me know what you prefer.
> 
> I will apply all to the kbuild tree.
> Your Ack is appreciated.

For both:

Acked-by: Will Deacon 

Will

Re: [PATCH v2 02/10] kvm: arm64: Partially link nVHE hyp code, simplify HYPCOPY

2020-09-14 Thread Will Deacon

On Thu, Sep 03, 2020 at 11:17:04AM +0200, David Brazdil wrote:
> Previous series introduced custom build rules for nVHE hyp code, using
> objcopy to prefix ELF section and symbol names to separate nVHE code
> into its own "namespace". This approach was limited by the expressiveness
> of objcopy's command line interface, eg. missing support for wildcards.

nit: "Previous series" isn't a lot of use here or in the git log. You can
just say something like:

  "Relying on objcopy to prefix the ELF section names of the nVHE hyp code
   is brittle and prevents us from using wildcards to match specific
   section names."

and then go on to explain what the change is doing (see
Documentation/process/submitting-patches.rst for more help here)

Also, given that this is independent of the other patches, please can you
move it right to the start of the series? I'm a bit worried about the
potential for regressions given the changes to the way in which we link,
so the sooner we can get this patch some more exposure, the better.

> Improve the build rules by partially linking all '.hyp.o' files and
> prefixing their ELF section names using a linker script. Continue using
> objcopy for prefixing ELF symbol names.
> 
> One immediate advantage of this approach is that all subsections
> matching a pattern can be merged into a single prefixed section, eg.
> .text and .text.* can be linked into a single '.hyp.text'. This removes
> the need for -fno-reorder-functions on GCC and will be useful in the
> future too: LTO builds use .text subsections, compilers routinely
> generate .rodata subsections, etc.
> 
> Partially linking all hyp code into a single object file also makes it
> easier to analyze.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kvm/hyp/nvhe/Makefile  | 56 ---
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 14 
>  2 files changed, 43 insertions(+), 27 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile 
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index aef76487edc2..1b2fbb19f3e8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -10,40 +10,42 @@ obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o 
> hyp-init.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>../fpsimd.o ../hyp-entry.o
>  
> -obj-y := $(patsubst %.o,%.hyp.o,$(obj-y))
> -extra-y := $(patsubst %.hyp.o,%.hyp.tmp.o,$(obj-y))
> +##
> +## Build rules for compiling nVHE hyp code
> +## Output of this folder is `hyp.o`, a partially linked object file 
> containing
> +## all nVHE hyp code and data.
> +##
>  
> -$(obj)/%.hyp.tmp.o: $(src)/%.c FORCE
> +hyp-obj := $(patsubst %.o,%.hyp.o,$(obj-y))
> +obj-y := hyp.o
> +extra-y := $(hyp-obj) hyp.tmp.o hyp.lds
> +
> +# 1) Compile all source files to `.hyp.o` object files. The file extension
> +#avoids file name clashes for files shared with VHE.
> +$(obj)/%.hyp.o: $(src)/%.c FORCE
>   $(call if_changed_rule,cc_o_c)
> -$(obj)/%.hyp.tmp.o: $(src)/%.S FORCE
> +$(obj)/%.hyp.o: $(src)/%.S FORCE
>   $(call if_changed_rule,as_o_S)
> -$(obj)/%.hyp.o: $(obj)/%.hyp.tmp.o FORCE
> - $(call if_changed,hypcopy)
>  
> -# Disable reordering functions by GCC (enabled at -O2).
> -# This pass puts functions into '.text.*' sections to aid the linker
> -# in optimizing ELF layout. See HYPCOPY comment below for more info.
> -ccflags-y += $(call cc-option,-fno-reorder-functions)
> +# 2) Compile linker script.
> +$(obj)/hyp.lds: $(src)/hyp.lds.S FORCE
> + $(call if_changed_dep,cpp_lds_S)

Why is it not sufficient just to list the linker script as a target, like
we do for vmlinux.lds in extra-y?

> +# 3) Partially link all '.hyp.o' files and apply the linker script.
> +#Prefixes names of ELF sections with '.hyp', eg. '.hyp.text'.
> +LDFLAGS_hyp.tmp.o := -r -T $(obj)/hyp.lds
> +$(obj)/hyp.tmp.o: $(addprefix $(obj)/,$(hyp-obj)) $(obj)/hyp.lds FORCE
> + $(call if_changed,ld)
> +
> +# 4) Produce the final 'hyp.o', ready to be linked into 'vmlinux'.
> +#Prefixes names of ELF symbols with '__kvm_nvhe_'.
> +$(obj)/hyp.o: $(obj)/hyp.tmp.o FORCE
> + $(call if_changed,hypcopy)
>  
>  # The HYPCOPY command uses `objcopy` to prefix all ELF symbol names
> -# and relevant ELF section names to avoid clashes with VHE code/data.
> -#
> -# Hyp code is assumed to be in the '.text' section of the input object
> -# files (with the exception of specialized sections such as
> -# '.hyp.idmap.text'). This assumption may be broken by a compiler that
> -# divides code into sections like '.text.unlikely' so as to optimize
> -# ELF layout. HYPCOPY checks that no such sections exist in the input
> -# using `objdump`, otherwise they would be linked together with other
> -# kernel code and not memory-mapped correctly at runtime.
> +# to avoid clashes with VHE code/data.
>  quiet_cmd_hypcopy = HYPCOPY $@
> -  cmd_hypcopy =

Re: [PATCH] arm64: bpf: Fix branch offset in JIT

2020-09-14 Thread Will Deacon

On Mon, Sep 14, 2020 at 11:36:21AM +0300, Ilias Apalodimas wrote:
> Running the eBPF test_verifier leads to random errors looking like this:
> 
> [ 6525.735488] Unexpected kernel BRK exception at EL1
> [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP
> [ 6525.741609] Modules linked in: nls_utf8 cifs libdes libarc4 dns_resolver 
> fscache binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd 
> cryptd aes_ce_cipher ghash_ce gf128mul efi_pstore sha2_ce sha256_arm64 
> sha1_ce evdev efivars efivarfs ip_tables x_tables autofs4 btrfs 
> blake2b_generic xor xor_neon zstd_compress raid6_pq libcrc32c crc32c_generic 
> ahci xhci_pci libahci xhci_hcd igb libata i2c_algo_bit nvme realtek usbcore 
> nvme_core scsi_mod t10_pi netsec mdio_devres of_mdio gpio_keys fixed_phy 
> libphy gpio_mb86s7x
> [ 6525.787760] CPU: 3 PID: 7881 Comm: test_verifier Tainted: GW   
>   5.9.0-rc1+ #47
> [ 6525.796111] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS 
> build #1 Jun  6 2020
> [ 6525.804812] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
> [ 6525.810390] pc : bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
> [ 6525.815613] lr : bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
> [ 6525.820832] sp : 8000130cbb80
> [ 6525.824141] x29: 8000130cbbb0 x28: 
> [ 6525.829451] x27: 05ef6fcbf39b x26: 
> [ 6525.834759] x25: 8000130cbb80 x24: 800011dc7038
> [ 6525.840067] x23: 8000130cbd00 x22: 0008f624d080
> [ 6525.845375] x21: 0001 x20: 800011dc7000
> [ 6525.850682] x19:  x18: 
> [ 6525.855990] x17:  x16: 
> [ 6525.861298] x15:  x14: 
> [ 6525.866606] x13:  x12: 
> [ 6525.871913] x11: 0001 x10: 800a660c
> [ 6525.877220] x9 : 800010951810 x8 : 8000130cbc38
> [ 6525.882528] x7 :  x6 : 009864cfa881
> [ 6525.887836] x5 : 00ff x4 : 002880ba1a0b3e9f
> [ 6525.893144] x3 : 0018 x2 : 800a4374
> [ 6525.898452] x1 : 000a x0 : 0009
> [ 6525.903760] Call trace:
> [ 6525.906202]  bpf_prog_c3d01833289b6311_F+0xc8/0x9f4
> [ 6525.911076]  bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c
> [ 6525.915957]  bpf_dispatcher_xdp_func+0x14/0x20
> [ 6525.920398]  bpf_test_run+0x70/0x1b0
> [ 6525.923969]  bpf_prog_test_run_xdp+0xec/0x190
> [ 6525.928326]  __do_sys_bpf+0xc88/0x1b28
> [ 6525.932072]  __arm64_sys_bpf+0x24/0x30
> [ 6525.935820]  el0_svc_common.constprop.0+0x70/0x168
> [ 6525.940607]  do_el0_svc+0x28/0x88
> [ 6525.943920]  el0_sync_handler+0x88/0x190
> [ 6525.947838]  el0_sync+0x140/0x180
> [ 6525.951154] Code: d4202000 d4202000 d4202000 d4202000 (d4202000)
> [ 6525.957249] ---[ end trace cecc3f93b14927e2 ]---
> 
> The reason seems to be the offset[] creation and usage ctx->offset[]

"seems to be"? Are you unsure?

> while building the eBPF body.  The code currently omits the first 
> instruction, since build_insn() will increase our ctx->idx before saving 
> it.  When "taken loop with back jump to 1st insn" test runs it will
> eventually call bpf2a64_offset(-1, 2, ctx). Since negative indexing is
> permitted, the current outcome depends on the value stored in
> ctx->offset[-1], which has nothing to do with our array.
> If the value happens to be 0 the tests will work. If not this error
> triggers.
> 
> So let's fix it by creating the ctx->offset[] correctly in the first
> place and account for the extra instruction while calculating the arm
> instruction offsets.

No Fixes: tag?

> Signed-off-by: Ilias Apalodimas 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Yauheni Kaliuta 

Non-author signoffs here. What's going on?

Will

Re: [PATCH v2] drivers/perf: xgene_pmu: Fix uninitialized resource struct

2020-09-14 Thread Will Deacon

On Sun, Sep 13, 2020 at 01:45:36PM -0400, Mark Salter wrote:
> @@ -1483,11 +1473,23 @@ xgene_pmu_dev_ctx *acpi_get_pmu_hw_inf(struct 
> xgene_pmu *xgene_pmu,
>   return NULL;
>  
>   INIT_LIST_HEAD(_list);
> - rc = acpi_dev_get_resources(adev, _list,
> - acpi_pmu_dev_add_resource, );
> + rc = acpi_dev_get_resources(adev, _list, NULL, NULL);
> + if (rc <= 0) {
> + dev_err(dev, "PMU type %d: No resources found\n", type);
> + return NULL;
> + }
> +
> + list_for_each_entry(rentry, _list, node) {
> + if (resource_type(rentry->res) == IORESOURCE_MEM) {
> + res = *rentry->res;
> + rentry = NULL;
> + break;
> + }
> + }
>   acpi_dev_free_resource_list(_list);
> - if (rc < 0) {
> - dev_err(dev, "PMU type %d: No resource address found\n", type);
> +
> + if (rentry) {

I'm curious as to why you've had to change the failure logic here, setting
rentry to NULL instead of checking 'rentry->res' like the TX2 driver (which
I don't immediately understand at first glance).

Will

Re: [PATCH v2 0/4] kselftests/arm64: add PAuth tests

2020-09-11 Thread Will Deacon

On Mon, Aug 31, 2020 at 12:04:46PM +0100, Boyan Karatotev wrote:
> Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
> It introduces instructions to sign addresses and later check for potential
> corruption using a second modifier value and one of a set of keys. The
> signature, in the form of the Pointer Authentication Code (PAC), is stored
> in some of the top unused bits of the virtual address (e.g. [54: 49] if
> TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
> controls are present to enable/disable groups of instructions (which use
> certain keys) for compatibility with libraries that do not utilize the
> feature. PAuth is used to verify the integrity of return addresses on the
> stack with less memory than the stack canary.

Any chance of a v3 addressing the couple of small comments from Dave on
the third patch, please? Then I can pick up the whole lot for 5.10.

Cheers,

Will

Re: [PATCH v2 1/3] arm64/mm: Remove CONT_RANGE_OFFSET

2020-09-11 Thread Will Deacon

On Thu, 10 Sep 2020 19:59:34 +1000, Gavin Shan wrote:
> The macro was introduced by commit  ("arm64: PTE/PMD
> contiguous bit definition") at the beginning. It's only used by
> commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"),
> which was reverted later by commit <667c27597ca8>. This makes the
> macro unused.
> 
> This removes the unused macro (CONT_RANGE_OFFSET).

Applied to arm64 (for-next/mm), thanks!

[1/3] arm64/mm: Remove CONT_RANGE_OFFSET
  https://git.kernel.org/arm64/c/11e339d53a73
[2/3] arm64/mm: Unify CONT_PTE_SHIFT
  https://git.kernel.org/arm64/c/c0d6de327f18
[3/3] arm64/mm: Unify CONT_PMD_SHIFT
  https://git.kernel.org/arm64/c/e676594115f0

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH 1/4] kbuild: remove cc-option test of -fno-strict-overflow

2020-09-11 Thread Will Deacon

On Thu, Sep 10, 2020 at 10:51:17PM +0900, Masahiro Yamada wrote:
> The minimal compiler versions, GCC 4.9 and Clang 10 support this flag.
> 
> Here is the godbolt:
> https://godbolt.org/z/odq8h9
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  Makefile  | 2 +-
>  arch/arm64/kernel/vdso32/Makefile | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

This, and the other patch (4/4 -- I didn't see 2 or 3), look good to me.
Are you taking them via the kbuild tree, or shall I queue them in the arm64
tree? Please just let me know what you prefer.

Will

Re: [PATCH V2 0/2] arm64/mm: Enable THP migration

2020-09-11 Thread Will Deacon

On Wed, 9 Sep 2020 10:23:01 +0530, Anshuman Khandual wrote:
> This series enables THP migration on arm64 via ARCH_ENABLE_THP_MIGRATION.
> But first this modifies all existing THP helpers like pmd_present() and
> pmd_trans_huge() etc per expected generic memory semantics as concluded
> from a previous discussion here.
> 
> https://lkml.org/lkml/2018/10/9/220
> 
> [...]

Applied to arm64 (for-next/mm), thanks!

[1/2] arm64/mm: Change THP helpers to comply with generic MM semantics
  https://git.kernel.org/arm64/c/b65399f6111b
[2/2] arm64/mm: Enable THP migration
  https://git.kernel.org/arm64/c/53fa117bb33c

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH V4] arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions

2020-09-11 Thread Will Deacon

On Wed, 9 Sep 2020 11:18:55 +0530, Anshuman Khandual wrote:
> HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
> scanned for /proc/cpuinfo are detached from their bit definitions making it
> vulnerable and difficult to correlate. It is also bit problematic because
> during /proc/cpuinfo dump these arrays get traversed sequentially assuming
> they reflect and match actual HWCAP bit sequence, to test various features
> for a given CPU. This redefines name arrays per their HWCAP bit definitions
> . It also warns after detecting any feature which is not expected on arm64.

Applied to arm64 (for-next/cpuinfo), thanks!

[1/1] arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions
  https://git.kernel.org/arm64/c/4e56de82d4ec

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] drivers/perf: xgene_pmu: Fix uninitialized resource struct

2020-09-11 Thread Will Deacon

On Tue, Sep 08, 2020 at 02:37:22PM -0400, Mark Salter wrote:
> On Mon, 2020-09-07 at 14:50 +0100, Will Deacon wrote:
> > On Wed, Sep 02, 2020 at 02:27:29PM -0400, Mark Salter wrote:
> > > diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
> > > index edac28cd25dd..fdbbd0804b92 100644
> > > --- a/drivers/perf/xgene_pmu.c
> > > +++ b/drivers/perf/xgene_pmu.c
> > > @@ -1483,6 +1483,7 @@ xgene_pmu_dev_ctx *acpi_get_pmu_hw_inf(struct 
> > > xgene_pmu *xgene_pmu,
> > >   return NULL;
> > >  
> > >   INIT_LIST_HEAD(_list);
> > > + memset(, 0, sizeof(res));
> > >   rc = acpi_dev_get_resources(adev, _list,
> > >   acpi_pmu_dev_add_resource, );
> > >   acpi_dev_free_resource_list(_list);
> > 
> > Hmm, to be honest, I'm not sure we should be calling devm_ioremap_resource()
> > at all here. The resource is clearly bogus, even with this change: the name
> > and the resource hierarchy pointers will all be NULL. I think it would be
> > better to follow the TX2 PMU driver (drivers/perf/thunderx2_pmu.c) which
> > appears to assign the resource directly in tx2_uncore_pmu_init_dev().
> > 
> > Is there a reason we can't do that?
> > 
> > Will
> > 
> 
> There's no difference between xgene and tx2 wrt resouce name/hierarchy.
> They both call devm_ioresource_remap() which ends up setting the name
> and hierarchy. The difference is that xgene calls acpi_dev_resource_memory()
> directly from the acpi_dev_get_resources() callback. TX2 doesn't use such a
> callback and in that case, acpi_dev_resource_memory() gets called with a
> zeroed struct.
> 
> That said, changing xgene to avoid the callback like TX2 does would fix the
> problem as well. If you'd rather go that route, I can send a patch for it.

Yes please

Will

Re: [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines

2020-09-11 Thread Will Deacon

On Fri, Sep 11, 2020 at 05:03:06PM +0100, Robin Murphy wrote:
> BTW am I supposed to have received 3 copies of everything? Because I did...

Yeah, this seems to be happening for all of Sai's emails :/

Will

Re: [PATCH] docs/memory-barriers.txt: Fix a typo in CPU MEMORY BARRIERS section

2020-09-11 Thread Will Deacon

On Wed, Sep 09, 2020 at 02:53:40PM +0800, Fox Chen wrote:
> Commit 39323c6 smp_mb__{before,after}_atomic(): update Documentation
> has a typo in CPU MEORY BARRIERS section:
> "RMW functions that do not imply are memory barrier are ..." should be
> "RMW functions that do not imply a memory barrier are ...".
> 
> This patch fixes this typo.
> 
> Signed-off-by: Fox Chen 
> ---
>  Documentation/memory-barriers.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 96186332e5f4..20b8a7b30320 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1870,7 +1870,7 @@ There are some more advanced barrier functions:
>  
>   These are for use with atomic RMW functions that do not imply memory
>   barriers, but where the code needs a memory barrier. Examples for atomic
> - RMW functions that do not imply are memory barrier are e.g. add,
> + RMW functions that do not imply a memory barrier are e.g. add,
>   subtract, (failed) conditional operations, _relaxed functions,
>   but not atomic_read or atomic_set. A common example where a memory
>   barrier may be required is when atomic ops are used for reference

The document remains unreadable, but this is still worth fixing!

Acked-by: Will Deacon 

Will

Re: [PATCH V3] arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions

2020-09-08 Thread Will Deacon

On Tue, Sep 08, 2020 at 10:43:12AM +0530, Anshuman Khandual wrote:
> 
> 
> On 09/07/2020 05:46 PM, Will Deacon wrote:
> > On Mon, Aug 17, 2020 at 05:34:23PM +0530, Anshuman Khandual wrote:
> >> HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
> >> scanned for /proc/cpuinfo are detached from their bit definitions making it
> >> vulnerable and difficult to correlate. It is also bit problematic because
> >> during /proc/cpuinfo dump these arrays get traversed sequentially assuming
> >> they reflect and match actual HWCAP bit sequence, to test various features
> >> for a given CPU. This redefines name arrays per their HWCAP bit definitions
> >> . It also warns after detecting any feature which is not expected on arm64.
> >>
> >> Cc: Catalin Marinas 
> >> Cc: Will Deacon 
> >> Cc: Mark Brown 
> >> Cc: Dave Martin 
> >> Cc: Ard Biesheuvel 
> >> Cc: Mark Rutland 
> >> Cc: Suzuki K Poulose 
> >> Cc: linux-arm-ker...@lists.infradead.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Anshuman Khandual 
> >> ---
> >> This applies on 5.9-rc1
> >>
> >> Mark, since the patch has changed I have dropped your Acked-by: tag. Are 
> >> you
> >> happy to give a new one ?
> >>
> >> Changes in V3:
> >>
> >> - Moved name arrays to (arch/arm64/kernel/cpuinfo.c) to prevent a build 
> >> warning
> >> - Replaced string values with NULL for all compat features not possible on 
> >> arm64
> >> - Changed compat_hwcap_str[] iteration on size as some NULL values are 
> >> expected
> >> - Warn once after detecting any feature on arm64 that is not expected
> >>
> >> Changes in V2: (https://patchwork.kernel.org/patch/11533755/)
> >>
> >> - Defined COMPAT_KERNEL_HWCAP[2] and updated the name arrays per Mark
> >> - Updated the commit message as required
> >>
> >> Changes in V1: (https://patchwork.kernel.org/patch/11532945/)
> >>
> >>  arch/arm64/include/asm/hwcap.h |   9 +++
> >>  arch/arm64/kernel/cpuinfo.c| 172 
> >> ++---
> >>  2 files changed, 100 insertions(+), 81 deletions(-)
> > 
> > [...]
> > 
> >> +  [KERNEL_HWCAP_FP]   = "fp",
> >> +  [KERNEL_HWCAP_ASIMD]= "asimd",
> >> +  [KERNEL_HWCAP_EVTSTRM]  = "evtstrm",
> >> +  [KERNEL_HWCAP_AES]  = "aes",
> > 
> > It would be nice if the cap and the string were generated by the same
> > macro, along the lines of:
> > 
> > #define KERNEL_HWCAP(c) [KERNEL_HWCAP_##c] = #c,
> > 
> > Does making the constants mixed case break anything, or is it just really
> > churny to do?
> 
> Currently all existing HWCAP feature strings are lower case, above change
> will make them into upper case instead. I could not find a method to force
> convert #c into lower case constant strings in the macro definition. Would
> not changing the HWCAP string case here, break user interface ?

Yes, we can't change the user-visible strings, but what's wrong with
having e.g. KERNEL_HWCAP_fp instead of KERNEL_HWCAP_FP?

Will

Re: [PATCH] arm64: topology: Stop using MPIDR for topology information

2020-09-07 Thread Will Deacon

On Sat, 29 Aug 2020 14:00:16 +0100, Valentin Schneider wrote:
> In the absence of ACPI or DT topology data, we fallback to haphazardly
> decoding *something* out of MPIDR. Sadly, the contents of that register are
> mostly unusable due to the implementation leniancy and things like Aff0
> having to be capped to 15 (despite being encoded on 8 bits).
> 
> Consider a simple system with a single package of 32 cores, all under the
> same LLC. We ought to be shoving them in the same core_sibling mask, but
> MPIDR is going to look like:
> 
> [...]

Applied to arm64 (for-next/topology), thanks!

[1/1] arm64: topology: Stop using MPIDR for topology information
  https://git.kernel.org/arm64/c/3102bc0e6ac7

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v3 0/7] set clang minimum version to 10.0.1

2020-09-07 Thread Will Deacon

On Wed, Sep 02, 2020 at 03:59:04PM -0700, Nick Desaulniers wrote:
> Adds a compile time #error to compiler-clang.h setting the effective
> minimum supported version to clang 10.0.1. A separate patch has already
> been picked up into the Documentation/ tree also confirming the version.

Modulo my horrible grammar nit on the last patch:

Acked-by: Will Deacon 

Thanks!

Will

Re: [PATCH v3 7/7] compiler-gcc: improve version error

2020-09-07 Thread Will Deacon

On Wed, Sep 02, 2020 at 03:59:11PM -0700, Nick Desaulniers wrote:
> As Kees suggests, doing so provides developers with two useful pieces of
> information:

I struggle to parse this. "doing so" what? These things are supposed to
be in the imperative.

Will

Re: [PATCH] arm64/mm/ptdump: Add address markers for BPF regions

2020-09-07 Thread Will Deacon

On Fri, 4 Sep 2020 14:00:59 +0530, Anshuman Khandual wrote:
> Kernel virtual region [BPF_JIT_REGION_START..BPF_JIT_REGION_END] is missing
> from address_markers[], hence relevant page table entries are not displayed
> with /sys/kernel/debug/kernel_page_tables. This adds those missing markers.
> While here, also rename arch/arm64/mm/dump.c which sounds bit ambiguous, as
> arch/arm64/mm/ptdump.c instead.

Applied to arm64 (for-next/mm), thanks!

[1/1] arm64/mm/ptdump: Add address markers for BPF regions
  https://git.kernel.org/arm64/c/c048ddf86cdd

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v3] perf: arm_dsu: Support DSU ACPI devices

2020-09-07 Thread Will Deacon

[+ Suzuki as I'd like his Ack on this]

On Fri, Aug 14, 2020 at 05:39:40PM -0700, Tuan Phan wrote:
> Add support for probing device from ACPI node.
> Each DSU ACPI node and its associated cpus are inside a cluster node.
> 
> Signed-off-by: Tuan Phan 
> ---
> Changes in v3:
> - Based on the latest ARM ACPI binding at: 
> https://developer.arm.com/documentation/den0093/c/
> 
> Changes in v2:
> - Removed IRQF_SHARED.
> - Fixed ACPI runtime detection.
> 
>  drivers/perf/arm_dsu_pmu.c | 68 
> --
>  1 file changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> index 96ed93c..4be355d 100644
> --- a/drivers/perf/arm_dsu_pmu.c
> +++ b/drivers/perf/arm_dsu_pmu.c
> @@ -11,6 +11,7 @@
>  #define DRVNAME  PMUNAME "_pmu"
>  #define pr_fmt(fmt)  DRVNAME ": " fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -603,18 +604,21 @@ static struct dsu_pmu *dsu_pmu_alloc(struct 
> platform_device *pdev)
>  }
>  
>  /**
> - * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
> + * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster
> + * from device tree.
>   */
> -static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
> +static int dsu_pmu_dt_get_cpus(struct platform_device *pdev)
>  {
>   int i = 0, n, cpu;
>   struct device_node *cpu_node;
> + struct dsu_pmu *dsu_pmu =
> + (struct dsu_pmu *) platform_get_drvdata(pdev);
>  
> - n = of_count_phandle_with_args(dev, "cpus", NULL);
> + n = of_count_phandle_with_args(pdev->dev.of_node, "cpus", NULL);
>   if (n <= 0)
>   return -ENODEV;
>   for (; i < n; i++) {
> - cpu_node = of_parse_phandle(dev, "cpus", i);
> + cpu_node = of_parse_phandle(pdev->dev.of_node, "cpus", i);
>   if (!cpu_node)
>   break;
>   cpu = of_cpu_node_to_id(cpu_node);
> @@ -626,11 +630,51 @@ static int dsu_pmu_dt_get_cpus(struct device_node *dev, 
> cpumask_t *mask)
>*/
>   if (cpu < 0)
>   continue;
> - cpumask_set_cpu(cpu, mask);
> + cpumask_set_cpu(cpu, _pmu->associated_cpus);
>   }
>   return 0;
>  }
>  
> +/**
> + * dsu_pmu_acpi_get_cpus: Get the list of CPUs in the cluster
> + * from ACPI.
> + */
> +static int dsu_pmu_acpi_get_cpus(struct platform_device *pdev)
> +{
> + int cpu;
> + struct dsu_pmu *dsu_pmu = (struct dsu_pmu *) platform_get_drvdata(pdev);
> +
> + /*
> +  * A dsu pmu node is inside a cluster parent node along with cpu nodes.
> +  * We need to find out all cpus that have the same parent with this pmu.
> +  */
> + for_each_possible_cpu(cpu) {
> + struct acpi_device *acpi_dev = 
> ACPI_COMPANION(get_cpu_device(cpu));
> +
> + if (acpi_dev &&
> + acpi_dev->parent == ACPI_COMPANION(>dev)->parent)
> + cpumask_set_cpu(cpu, _pmu->associated_cpus);
> + }
> +
> + return 0;
> +}
> +
> +static int dsu_pmu_platform_get_cpus(struct platform_device *pdev)
> +{
> + int ret = -ENOENT;
> + struct fwnode_handle *fwnode = dev_fwnode(>dev);
> +
> + if (IS_ERR_OR_NULL(fwnode))
> + return ret;
> +
> + if (is_of_node(fwnode))
> + ret = dsu_pmu_dt_get_cpus(pdev);
> + else if (is_acpi_device_node(fwnode))
> + ret = dsu_pmu_acpi_get_cpus(pdev);
> +
> + return ret;
> +}
> +
>  /*
>   * dsu_pmu_probe_pmu: Probe the PMU details on a CPU in the cluster.
>   */
> @@ -683,7 +727,9 @@ static int dsu_pmu_device_probe(struct platform_device 
> *pdev)
>   if (IS_ERR(dsu_pmu))
>   return PTR_ERR(dsu_pmu);
>  
> - rc = dsu_pmu_dt_get_cpus(pdev->dev.of_node, _pmu->associated_cpus);
> + platform_set_drvdata(pdev, dsu_pmu);

Hmm, this is a bit nasty because we haven't finished initialising the
dsu_pmu yet. I think it would actually be cleaner if you kept:

static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)

for the DT case and added:

static int dsu_pmu_acpi_get_cpus(struct device *dev, cpumask_t *mask)

for the ACPI case. I suppose the DT case could take the struct device * too
if you wanted the prototypes to be the same.

> + rc = dsu_pmu_platform_get_cpus(pdev);
>   if (rc) {
>   dev_warn(>dev, "Failed to parse the CPUs\n");
>   return rc;
> @@ -705,7 +751,6 @@ static int dsu_pmu_device_probe(struct platform_device 
> *pdev)
>   }
>  
>   dsu_pmu->irq = irq;
> - platform_set_drvdata(pdev, dsu_pmu);
>   rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
>   _pmu->cpuhp_node);
>   if (rc)
> @@ -752,11 +797,19 @@ static const struct of_device_id dsu_pmu_of_match[] = {
>   { .compatible = "arm,dsu-pmu", },
>   {},
>  };
> +MODULE_DEVICE_TABLE(of,

Re: [PATCH V3] arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions

2020-09-07 Thread Will Deacon

On Mon, Aug 17, 2020 at 05:34:23PM +0530, Anshuman Khandual wrote:
> HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
> scanned for /proc/cpuinfo are detached from their bit definitions making it
> vulnerable and difficult to correlate. It is also bit problematic because
> during /proc/cpuinfo dump these arrays get traversed sequentially assuming
> they reflect and match actual HWCAP bit sequence, to test various features
> for a given CPU. This redefines name arrays per their HWCAP bit definitions
> . It also warns after detecting any feature which is not expected on arm64.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Brown 
> Cc: Dave Martin 
> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: Suzuki K Poulose 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
> This applies on 5.9-rc1
> 
> Mark, since the patch has changed I have dropped your Acked-by: tag. Are you
> happy to give a new one ?
> 
> Changes in V3:
> 
> - Moved name arrays to (arch/arm64/kernel/cpuinfo.c) to prevent a build 
> warning
> - Replaced string values with NULL for all compat features not possible on 
> arm64
> - Changed compat_hwcap_str[] iteration on size as some NULL values are 
> expected
> - Warn once after detecting any feature on arm64 that is not expected
> 
> Changes in V2: (https://patchwork.kernel.org/patch/11533755/)
> 
> - Defined COMPAT_KERNEL_HWCAP[2] and updated the name arrays per Mark
> - Updated the commit message as required
> 
> Changes in V1: (https://patchwork.kernel.org/patch/11532945/)
> 
>  arch/arm64/include/asm/hwcap.h |   9 +++
>  arch/arm64/kernel/cpuinfo.c| 172 
> ++---
>  2 files changed, 100 insertions(+), 81 deletions(-)

[...]

> + [KERNEL_HWCAP_FP]   = "fp",
> + [KERNEL_HWCAP_ASIMD]= "asimd",
> + [KERNEL_HWCAP_EVTSTRM]  = "evtstrm",
> + [KERNEL_HWCAP_AES]  = "aes",

It would be nice if the cap and the string were generated by the same
macro, along the lines of:

#define KERNEL_HWCAP(c) [KERNEL_HWCAP_##c] = #c,

Does making the constants mixed case break anything, or is it just really
churny to do?

> @@ -166,9 +167,18 @@ static int c_show(struct seq_file *m, void *v)
>   seq_puts(m, "Features\t:");
>   if (compat) {
>  #ifdef CONFIG_COMPAT
> - for (j = 0; compat_hwcap_str[j]; j++)
> - if (compat_elf_hwcap & (1 << j))
> + for (j = 0; j < ARRAY_SIZE(compat_hwcap_str); j++) {
> + if (compat_elf_hwcap & (1 << j)) {
> + /*
> +  * Warn once if any feature should not
> +  * have been present on arm64 platform.
> +  */
> + if (WARN_ON_ONCE(!compat_hwcap_str[j]))
> + continue;
> +
>   seq_printf(m, " %s", 
> compat_hwcap_str[j]);
> + }
> + }
>  
>   for (j = 0; compat_hwcap2_str[j]; j++)

Hmm, I find this pretty confusing now as compat_hwcap_str is not NULL
terminated and must be traversed with a loop bounded by ARRAY_SIZE(...),
whereas compat_hwcap2_str *is* NULL terminated and is traversed until you
hit the sentinel.

I think hwcap_str, compat_hwcap_str and compat_hwcap2_str should be
identical in this regard.

Will

Re: [PATCH 1/4] drivers/perf: Add support for ARMv8.3-SPE

2020-09-07 Thread Will Deacon

On Fri, Jul 24, 2020 at 05:16:04PM +0800, Wei Li wrote:
> Armv8.3 extends the SPE by adding:
> - Alignment field in the Events packet, and filtering on this event
>   using PMSEVFR_EL1.
> - Support for the Scalable Vector Extension (SVE).
> 
> The main additions for SVE are:
> - Recording the vector length for SVE operations in the Operation Type
>   packet. It is not possible to filter on vector length.
> - Incomplete predicate and empty predicate fields in the Events packet,
>   and filtering on these events using PMSEVFR_EL1.
> 
> Update the check of pmsevfr for empty/partial predicated SVE and
> alignment event in kernel driver.
> 
> Signed-off-by: Wei Li 
> ---
>  arch/arm64/include/asm/sysreg.h |  4 +++-
>  drivers/perf/arm_spe_pmu.c  | 18 ++
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 463175f80341..be4c44ccdb56 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -281,7 +281,6 @@
>  #define SYS_PMSFCR_EL1_ST_SHIFT  18
>  
>  #define SYS_PMSEVFR_EL1  sys_reg(3, 0, 9, 9, 5)
> -#define SYS_PMSEVFR_EL1_RES0 0x00ff0f55UL

I think we can just update this mask unconditionally to allow the new bits.

>  #define SYS_PMSLATFR_EL1 sys_reg(3, 0, 9, 9, 6)
>  #define SYS_PMSLATFR_EL1_MINLAT_SHIFT0
> @@ -769,6 +768,9 @@
>  #define ID_AA64DFR0_PMUVER_8_5   0x6
>  #define ID_AA64DFR0_PMUVER_IMP_DEF   0xf
>  
> +#define ID_AA64DFR0_PMSVER_8_2   0x1
> +#define ID_AA64DFR0_PMSVER_8_3   0x2
> +
>  #define ID_DFR0_PERFMON_SHIFT24
>  
>  #define ID_DFR0_PERFMON_8_1  0x4
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index e51ddb6d63ed..5ec7ee0c8fa1 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -54,7 +54,7 @@ struct arm_spe_pmu {
>   struct hlist_node   hotplug_node;
>  
>   int irq; /* PPI */
> -
> + int pmuver;

nit: please call this "pmsver" to align with the architecture (where
"pmuver" means something else).

>   u16 min_period;
>   u16 counter_sz;
>  
> @@ -80,6 +80,15 @@ struct arm_spe_pmu {
>  /* Keep track of our dynamic hotplug state */
>  static enum cpuhp_state arm_spe_pmu_online;
>  
> +static u64 sys_pmsevfr_el1_mask[] = {
> + [ID_AA64DFR0_PMSVER_8_2] = GENMASK_ULL(63, 48) | GENMASK_ULL(31, 24) |
> + GENMASK_ULL(15, 12) | BIT_ULL(7) | BIT_ULL(5) | BIT_ULL(3) |
> + BIT_ULL(1),
> + [ID_AA64DFR0_PMSVER_8_3] = GENMASK_ULL(63, 48) | GENMASK_ULL(31, 24) |
> + GENMASK_ULL(18, 17) | GENMASK_ULL(15, 11) | BIT_ULL(7) |
> + BIT_ULL(5) | BIT_ULL(3) | BIT_ULL(1),
> +};

As I said above, you can drop this and just update the #define.

> +
>  enum arm_spe_pmu_buf_fault_action {
>   SPE_PMU_BUF_FAULT_ACT_SPURIOUS,
>   SPE_PMU_BUF_FAULT_ACT_FATAL,
> @@ -670,7 +679,7 @@ static int arm_spe_pmu_event_init(struct perf_event 
> *event)
>   !cpumask_test_cpu(event->cpu, _pmu->supported_cpus))
>   return -ENOENT;
>  
> - if (arm_spe_event_to_pmsevfr(event) & SYS_PMSEVFR_EL1_RES0)
> + if (arm_spe_event_to_pmsevfr(event) & 
> ~sys_pmsevfr_el1_mask[spe_pmu->pmuver])
>   return -EOPNOTSUPP;

Same here.

>  
>   if (attr->exclude_idle)
> @@ -937,6 +946,7 @@ static void __arm_spe_pmu_dev_probe(void *info)
>   fld, smp_processor_id());
>   return;
>   }
> + spe_pmu->pmuver = fld;
>  
>   /* Read PMBIDR first to determine whether or not we have access */
>   reg = read_sysreg_s(SYS_PMBIDR_EL1);
> @@ -1027,8 +1037,8 @@ static void __arm_spe_pmu_dev_probe(void *info)
>   }
>  
>   dev_info(dev,
> -  "probed for CPUs %*pbl [max_record_sz %u, align %u, features 
> 0x%llx]\n",
> -  cpumask_pr_args(_pmu->supported_cpus),
> +  "v%d probed for CPUs %*pbl [max_record_sz %u, align %u, 
> features 0x%llx]\n",
> +  spe_pmu->pmuver, cpumask_pr_args(_pmu->supported_cpus),

There's no need for this. If userspace finds this information useful, then
we should expose it in sysfs, like we do for other PMU paramaters. If
userspace doesn't find it useful, then there's no need to expose it at all.

So I would suggest adding something like SPE_PMU_CAP_PMSVER and exposing the
field on a per-SPE-PMU basis in sysfs.

big.LITTLE should work as before, where we expose a completely separate PMU
instance for each CPU type.

Will
>

Re: [PATCH] arm64: perf: Remove unnecessary event_idx check

2020-09-07 Thread Will Deacon

On Fri, 4 Sep 2020 17:57:38 +0800, Qi Liu wrote:
> event_idx is obtained from armv8pmu_get_event_idx(), and this idx must be
> between ARMV8_IDX_CYCLE_COUNTER and cpu_pmu->num_events. So it's unnecessary
> to do this check. Let's remove it.

Applied to will (for-next/perf), thanks!

[1/1] arm64: perf: Remove unnecessary event_idx check
  https://git.kernel.org/will/c/44fdf4ed2693

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] PR_SPEC_DISABLE_NOEXEC support for arm64.

2020-09-07 Thread Will Deacon

On Fri, Jul 17, 2020 at 04:05:32AM -0700, Anthony Steinhauser wrote:
> For x64 it was already implemented in:
> https://github.com/torvalds/linux/commit/71368af
> 
> The rationale is the same as for the x64 implementation.
> 
> Signed-off-by: Anthony Steinhauser 
> ---
> 
> It's actively attempted by OpenJDK on arm64 CentOS and Fedora:
> https://git.centos.org/rpms/java-11-openjdk/blob/c8s/f/SOURCES/rh1566890-CVE_2018_3639-speculative_store_bypass.patch
> 
>  arch/arm64/include/asm/ssbd.h | 28 
>  arch/arm64/kernel/process.c   | 13 +
>  arch/arm64/kernel/ssbd.c  | 34 +-
>  3 files changed, 58 insertions(+), 17 deletions(-)
>  create mode 100644 arch/arm64/include/asm/ssbd.h

As a heads up: I'm currently reworking most of this, and hope to post
something within the next two weeks.

> diff --git a/arch/arm64/include/asm/ssbd.h b/arch/arm64/include/asm/ssbd.h
> new file mode 100644
> index ..68c716dc5811
> --- /dev/null
> +++ b/arch/arm64/include/asm/ssbd.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 Google LLC.
> + */
> +#ifndef __ASM_SSBD_H
> +#define __ASM_SSBD_H
> +
> +#include 
> +#include 
> +#include 
> +
> +static inline void ssbd_ssbs_enable(struct task_struct *task)
> +{
> + u64 val = is_compat_thread(task_thread_info(task)) ?
> + PSR_AA32_SSBS_BIT : PSR_SSBS_BIT;
> +
> + task_pt_regs(task)->pstate |= val;
> +}
> +
> +static inline void ssbd_ssbs_disable(struct task_struct *task)
> +{
> + u64 val = is_compat_thread(task_thread_info(task)) ?
> + PSR_AA32_SSBS_BIT : PSR_SSBS_BIT;
> +
> + task_pt_regs(task)->pstate &= ~val;
> +}

I'd prefer to keep these where they are and have an out-of-line call if
necessary. We should try to keep the SSBD stuff in one place.

> +
> +#endif   /* __ASM_SSBD_H */
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 6089638c7d43..ad3c67c86c4c 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -54,6 +54,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #if defined(CONFIG_STACKPROTECTOR) && 
> !defined(CONFIG_STACKPROTECTOR_PER_TASK)
> @@ -588,6 +589,18 @@ void arch_setup_new_exec(void)
>   current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
>  
>   ptrauth_thread_init_user(current);
> +
> + /*
> +  * Don't inherit TIF_SSBD across exec boundary when
> +  * PR_SPEC_DISABLE_NOEXEC is used.
> +  */
> + if (test_thread_flag(TIF_SSBD) &&
> + task_spec_ssb_noexec(current)) {
> + clear_thread_flag(TIF_SSBD);
> + task_clear_spec_ssb_disable(current);
> + task_clear_spec_ssb_noexec(current);
> + ssbd_ssbs_enable(current);
> + }

How is this supposed to work with CPUs that expose SSBS directly to
userspace? I suppose we should be using PR_SPEC_DISABLE_NOEXEC to decide
what we set the SSBS bit to on exec, but the logic here requires TIF_SSBD
to be set and so won't trigger afaict.

Will

Re: [PATCH] drivers/perf: xgene_pmu: Fix uninitialized resource struct

2020-09-07 Thread Will Deacon

On Wed, Sep 02, 2020 at 02:27:29PM -0400, Mark Salter wrote:
> diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
> index edac28cd25dd..fdbbd0804b92 100644
> --- a/drivers/perf/xgene_pmu.c
> +++ b/drivers/perf/xgene_pmu.c
> @@ -1483,6 +1483,7 @@ xgene_pmu_dev_ctx *acpi_get_pmu_hw_inf(struct xgene_pmu 
> *xgene_pmu,
>   return NULL;
>  
>   INIT_LIST_HEAD(_list);
> + memset(, 0, sizeof(res));
>   rc = acpi_dev_get_resources(adev, _list,
>   acpi_pmu_dev_add_resource, );
>   acpi_dev_free_resource_list(_list);

Hmm, to be honest, I'm not sure we should be calling devm_ioremap_resource()
at all here. The resource is clearly bogus, even with this change: the name
and the resource hierarchy pointers will all be NULL. I think it would be
better to follow the TX2 PMU driver (drivers/perf/thunderx2_pmu.c) which
appears to assign the resource directly in tx2_uncore_pmu_init_dev().

Is there a reason we can't do that?

Will

Re: [PATCH 1/2] locking/mutex: Don't hog RCU read lock while optimistically spinning

2020-09-07 Thread Will Deacon

On Fri, Aug 07, 2020 at 12:16:35PM -0700, Sultan Alsawaf wrote:
> From: Sultan Alsawaf 
> 
> There's no reason to hold an RCU read lock the entire time while
> optimistically spinning for a mutex lock. This can needlessly lengthen
> RCU grace periods and slow down synchronize_rcu() when it doesn't brute
> force the RCU grace period via rcupdate.rcu_expedited=1.

Would be good to demonstrate this with numbers if you can.

> Signed-off-by: Sultan Alsawaf 
> ---
>  kernel/locking/mutex.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 5352ce50a97e..cc5676712458 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -552,21 +552,31 @@ bool mutex_spin_on_owner(struct mutex *lock, struct 
> task_struct *owner,
>  {
>   bool ret = true;
>  
> - rcu_read_lock();
> - while (__mutex_owner(lock) == owner) {
> + for (;;) {
> + unsigned int cpu;
> + bool same_owner;
> +
>   /*
> -  * Ensure we emit the owner->on_cpu, dereference _after_
> -  * checking lock->owner still matches owner. If that fails,
> +  * Ensure lock->owner still matches owner. If that fails,
>* owner might point to freed memory. If it still matches,
>* the rcu_read_lock() ensures the memory stays valid.
>*/
> - barrier();
> + rcu_read_lock();
> + same_owner = __mutex_owner(lock) == owner;
> + if (same_owner) {
> + ret = owner->on_cpu;
> + if (ret)
> + cpu = task_cpu(owner);
> + }
> + rcu_read_unlock();

Are you sure this doesn't break the ww mutex spinning? That thing also goes
and looks at the owner, but now it's called outside of the read-side
critical section.

Will

Re: [PATCH] arm64: fix some spelling mistakes in the comments by codespell

2020-09-07 Thread Will Deacon

On Fri, 28 Aug 2020 11:18:22 +0800, Xiaoming Ni wrote:
> 


Applied to arm64 (for-next/tpyos), thanks!

[1/1] arm64: fix some spelling mistakes in the comments by codespell
  https://git.kernel.org/arm64/c/ad14c19242b5

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] iommu/arm-smmu-v3: Fix l1 stream table size in the error message

2020-09-07 Thread Will Deacon

On Wed, 26 Aug 2020 22:17:58 +0800, Zenghui Yu wrote:
> The actual size of level-1 stream table is l1size. This looks like an
> oversight on commit d2e88e7c081ef ("iommu/arm-smmu: Fix LOG2SIZE setting
> for 2-level stream tables") which forgot to update the @size in error
> message as well.
> 
> As memory allocation failure is already bad enough, nothing worse would
> happen. But let's be careful.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: Fix l1 stream table size in the error message
  https://git.kernel.org/will/c/dc898eb84b25

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] arm64: traps: Add str of description to panic() in die()

2020-09-07 Thread Will Deacon

On Tue, 4 Aug 2020 16:53:47 +0800, Yue Hu wrote:
> Currently, there are different description strings in die() such as
> die("Oops",,), die("Oops - BUG",,). And panic() called by die() will
> always show "Fatal exception" or "Fatal exception in interrupt".
> 
> Note that panic() will run any panic handler via panic_notifier_list.
> And the string above will be formatted and placed in static buf[]
> which will be passed to handler.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: traps: Add str of description to panic() in die()
  https://git.kernel.org/arm64/c/b4c971245925

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH v2 0/2] ACPI/IORT: Code cleanups

2020-09-07 Thread Will Deacon

On Tue, 18 Aug 2020 14:36:23 +0800, Zenghui Yu wrote:
> * From v1 [1]:
>   - As pointed out by Hanjun, remove two now unused inline functions.
> Compile tested with CONFIG_IOMMU_API is not selected.
> 
> [1] https://lore.kernel.org/r/20200817105946.1511-1-yuzeng...@huawei.com
> 
> Zenghui Yu (2):
>   ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
>   ACPI/IORT: Remove the unused inline functions
> 
> [...]

Applied to arm64 (for-next/acpi), thanks!

[1/2] ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
  https://git.kernel.org/arm64/c/1ab64cf81489
[2/2] ACPI/IORT: Remove the unused inline functions
  https://git.kernel.org/arm64/c/c2bea7a1a1c0

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] arm64: perf: Add general hardware LLC events for PMUv3

2020-09-07 Thread Will Deacon

On Tue, 11 Aug 2020 13:35:05 +0800, Leo Yan wrote:
> This patch is to add the general hardware last level cache (LLC) events
> for PMUv3: one event is for LLC access and another is for LLC miss.
> 
> With this change, perf tool can support last level cache profiling,
> below is an example to demonstrate the usage on Arm64:
> 
>   $ perf stat -e LLC-load-misses -e LLC-loads -- \
> perf bench mem memcpy -s 1024MB -l 100 -f default
> 
> [...]

Applied to will (for-next/perf), thanks!

[1/1] arm64: perf: Add general hardware LLC events for PMUv3
  https://git.kernel.org/will/c/ffdbd3d83553

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] arm64/numa: Fix a typo in comment of arm64_numa_init

2020-09-07 Thread Will Deacon

On Tue, 1 Sep 2020 17:11:54 +0800, yanfei...@windriver.com wrote:
> Fix a typo in comment of arm64_numa_init. 'encomapssing' should
> be 'encompassing'.

Applied to arm64 (for-next/tpyos), thanks!

[1/1] arm64/numa: Fix a typo in comment of arm64_numa_init
  https://git.kernel.org/arm64/c/9a747c91e8d6

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

Re: [PATCH] kbuild: preprocess module linker script

2020-09-07 Thread Will Deacon

On Fri, Sep 04, 2020 at 10:31:21PM +0900, Masahiro Yamada wrote:
> There was a request to preprocess the module linker script like we do
> for the vmlinux one (https://lkml.org/lkml/2020/8/21/512).
> 
> The difference between vmlinux.lds and module.lds is that the latter
> is needed for external module builds, thus must be cleaned up by
> 'make mrproper' instead of 'make clean' (also, it must be created by
> 'make modules_prepare').
> 
> You cannot put it in arch/*/kernel/ because 'make clean' descends into
> it. I moved arch/*/kernel/module.lds to arch/*/include/asm/module.lds.h,
> which is included from scripts/module.lds.S.
> 
> scripts/module.lds is fine because 'make clean' keeps all the build
> artifacts under scripts/.
> 
> You can add arch-specific sections in .
> 
> Signed-off-by: Masahiro Yamada 
> Tested-by: Jessica Yu 
> ---

For the arm64 bits:

Acked-by: Will Deacon 

Thanks,

Will

Re: [PATCH 1/2] sched: Bring the PF_IO_WORKER and PF_WQ_WORKER bits closer together

2020-09-07 Thread Will Deacon

On Wed, Aug 19, 2020 at 09:55:05PM +0200, Sebastian Andrzej Siewior wrote:
> The bits PF_IO_WORKER and PF_WQ_WORKER are tested together in
> sched_submit_work() which is considered to be a hot path.
> If the two bits cross the 8 or 16 bit boundary then most architecture
> require multiple load instructions in order to create the constant
> value. Also, such a value can not be encoded within the compare opcode.
> 
> By moving the bit definition within the same block, the compiler can
> create/use one immediate value.
> 
> For some reason gcc-10 on ARM64 requires both bits to be next to each
> other in order to issue "tst reg, val; bne label". Otherwise the result
> is "mov reg1, val; tst reg, reg1; bne label".
> 
> Move PF_VCPU out of the way so that PF_IO_WORKER can be next to
> PF_WQ_WORKER.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> ---
> 
> Could someone from the ARM64 camp please verify if this a gcc "bug" or
> opcode/arch limitation? With PF_IO_WORKER as 1 (without the PF_VCPU
> swap) I get for ARM:
> 
> | tst r2, #33 @ task_flags,
> | beq .L998   @,
> 
> however ARM64 does here:
> | mov w0, 33  // tmp117,
> | tst w19, w0 // task_flags, tmp117
> | bne .L453   //,
> 
> the extra mov operation. Moving PF_IO_WORKER next to PF_WQ_WORKER as
> this patch gives me:
> | tst w19, 48 // task_flags,
> | bne .L453   //,

Moving an immediate into a register really shouldn't be a performance
issue, so I don't think this is a problem. However, the reason GCC does
this is because of the slightly weird way in which immediates are encoded,
meaning that '33' can't be packed into the 'tst' alias. You can try to
decipher the "DecodeBitMasks()" pseudocode in the Arm ARM if you're
interested.

Will

Re: [PATCH v4 00/14] arm64: Optimise and update memcpy, user copy and string routines

2020-09-07 Thread Will Deacon

Hi Oli,

Thanks for this. Just a few high-level comments below.

On Wed, Jul 01, 2020 at 09:12:49AM +0100, Oli Swede wrote:
> > Version 3 addressed this but I later found some issues with the fixup
> > correctness after further testing, and have partially re-written them
> > here, and addressed some other behaviours of the copy algorithm.

[...]

> I am waiting on access to the relevant machine before posting the benchmark
> results for this optimized memcpy, but Sam reported the following with the
> similar (but now slightly older) cortex-strings version:
>   * copy_from_user: 13.17%
>   * copy_to_user: 4.8%
>   * memcpy: 27.88%
>   * copy_in_user: Didn't appear in the test results.
> This machine will also be used to check the fixups are accurate on a system
> with UAO - they appear to be exact on a non-UAO system with PAN that I've
> been working on locally.

I'm inclined to say that cortex-strings is probably not a good basis for
our uaccess routines. The code needs to be adapted in a non-straightforward
way so that we lose pretty much all of the benefits we'd usually get from
adopted an existing implementation; we can't pull in fixes or improvements
without a lot of manual effort, we can't reuse existing testing infrastructure
(see below) and we end up being a "second-class" user of the routines
because of the discrepancies in implementation.

So why don't we use cortex-strings as a basis for the in-kernel routines
only, preferably in a form where the code can be used directly and updated
with a script (e.g. similar to how we pull in arch/arm64/crypto routines
from OpenSSL). We can then roll our own uaccess routines, using a slightly
more straight-forward implementation which is more amenable to handling
user faults and doesn't do things like over copying.

> I should also mention that the correctness of these routines were tested
> using a selftest test module akin to lib/test_user_copy.c (whose usercopy
> functionality checks these patches do pass) but which is more specific to
> the fixup accuracy, in that it compares the return value with the true
> number of bytes remaining in the destination buffer at the point of a fault.

Can we put this test module into the kernel source tree, please, maybe as
part of lkdtm? Given the control flow of these optimised functions, I think
we absolutely need targetted testing to make sure we're getting complete
coverage.

Will

Re: [PATCH v2 0/2] ACPI/IORT: Code cleanups

2020-09-02 Thread Will Deacon

On Wed, Sep 02, 2020 at 05:17:43PM +0800, Hanjun Guo wrote:
> +Cc Will
> 
> On 2020/8/18 17:16, Hanjun Guo wrote:
> > On 2020/8/18 14:36, Zenghui Yu wrote:
> > > * From v1 [1]:
> > >    - As pointed out by Hanjun, remove two now unused inline functions.
> > >  Compile tested with CONFIG_IOMMU_API is not selected.
> > > 
> > > [1] https://lore.kernel.org/r/20200817105946.1511-1-yuzeng...@huawei.com
> > > 
> > > Zenghui Yu (2):
> > >    ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
> > >    ACPI/IORT: Remove the unused inline functions
> > > 
> > >   drivers/acpi/arm64/iort.c | 10 ++
> > >   1 file changed, 2 insertions(+), 8 deletions(-)
> > 
> > Nice cleanup.
> > 
> > Acked-by: Hanjun Guo 
> 
> Will, would you mind taking this patch set via ARM64 tree?

Sure, no problem. I'll queue this for 5.10 later this week.

Will

Re: [PATCH] arm64: Enable PCI write-combine resources under sysfs

2020-09-02 Thread Will Deacon

On Wed, Sep 02, 2020 at 09:22:53AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2020-09-01 at 13:37 -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 31, 2020 at 03:18:27PM +, Clint Sbisa wrote:
> > > Using write-combine is crucial for performance of PCI devices where
> > > significant amounts of transactions go over PCI BARs.
> > > 
> > > arm64 supports write-combine PCI mappings, so the appropriate
> > > define
> > > has been added which will expose write-combine mappings under sysfs
> > > for prefetchable PCI resources.
> > > 
> > > Signed-off-by: Clint Sbisa 
> > 
> > Fine with me, I assume Will or Catalin will apply this.
> 
> Haha ! Client had sent it to them originally and I told him to resend
> it to linux-pci, yourself and Lorenzo :-)
> 
> So the confusion is on me.
> 
> Will, Catalin, it's all yours. You should have the original patch in
> your mbox already, otherwise:
> 
> https://patchwork.kernel.org/patch/11729875/

Yup, it's not the radar. We don't usually start queuing stuff until -rc3, so
I'm working through the backlog this week. Would like an Ack from Lorenzo,
though.

Will

Re: [PATCH v2 1/1] netfilter: nat: add a range check for l3/l4 protonum

2020-09-01 Thread Will Deacon

Hi Will, Pablo,

On Tue, Aug 04, 2020 at 01:37:11PM +0200, Pablo Neira Ayuso wrote:
> This patch is much smaller and if you confirm this is address the
> issue, then this is awesome.

Did that ever get confirmed? AFAICT, nothing ended up landing in the stable
trees for this.

Cheers,

Will


> On Mon, Aug 03, 2020 at 06:31:56PM +, William Mcvicker wrote:
> [...]
> > diff --git a/net/netfilter/nf_conntrack_netlink.c 
> > b/net/netfilter/nf_conntrack_netlink.c
> > index 31fa94064a62..56d310f8b29a 100644
> > --- a/net/netfilter/nf_conntrack_netlink.c
> > +++ b/net/netfilter/nf_conntrack_netlink.c
> > @@ -1129,6 +1129,8 @@ ctnetlink_parse_tuple(const struct nlattr * const 
> > cda[],
> > if (!tb[CTA_TUPLE_IP])
> > return -EINVAL;
> >  
> > +   if (l3num >= NFPROTO_NUMPROTO)
> > +   return -EINVAL;
> 
> l3num can only be either NFPROTO_IPV4 or NFPROTO_IPV6.
> 
> Other than that, bail out with EOPNOTSUPP.
> 
> Thank you.

Re: [PATCH v2] module: Harden STRICT_MODULE_RWX

2020-09-01 Thread Will Deacon

On Mon, Aug 31, 2020 at 11:46:51AM +0200, Jessica Yu wrote:
> +++ Will Deacon [21/08/20 13:30 +0100]:
> [snipped]
> > > > > > So module_enforce_rwx_sections() is already called after
> > > > > > module_frob_arch_sections() - which really baffled me at first, 
> > > > > > since
> > > > > > sh_type and sh_flags should have been set already in
> > > > > > module_frob_arch_sections().
> > > > > >
> > > > > > I added some debug prints to see which section the module code was
> > > > > > tripping on, and it was .text.ftrace_trampoline. See this snippet 
> > > > > > from
> > > > > > arm64's module_frob_arch_sections():
> > > > > >
> > > > > > else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE) &&
> > > > > >  !strcmp(secstrings + sechdrs[i].sh_name,
> > > > > >  ".text.ftrace_trampoline"))
> > > > > > tramp = sechdrs + i;
> > > > > >
> > > > > > Since Mauro's config doesn't have CONFIG_DYNAMIC_FTRACE enabled, 
> > > > > > tramp
> > > > > > is never set here and the if (tramp) check at the end of the 
> > > > > > function
> > > > > > fails, so its section flags are never set, so they remain WAX and 
> > > > > > fail
> > > > > > the rwx check.
> > > > >
> > > > > Right. Our module.lds does not go through the preprocessor, so we
> > > > > cannot add the #ifdef check there currently. So we should either drop
> > > > > the IS_ENABLED() check here, or simply rename the section, dropping
> > > > > the .text prefix (which doesn't seem to have any significance outside
> > > > > this context)
> > > > >
> > > > > I'll leave it to Will to make the final call here.
> > > >
> > > > Why don't we just preprocess the linker script, like we do for the main
> > > > kernel?
> > > >
> > > 
> > > That should work as well, I just haven't checked how straight-forward
> > > it is to change that.
> > 
> > Ok, if it's _not_ straightforward, then let's just drop the IS_ENABLED()
> > altogether.
> 
> Unfortunately I've been getting more reports about this issue, so let's just
> get the aforementioned workaround merged first. Does the following look OK?
> 
> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
> index 0ce3a28e3347..2e224435c024 100644
> --- a/arch/arm64/kernel/module-plts.c
> +++ b/arch/arm64/kernel/module-plts.c
> @@ -305,8 +305,7 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr 
> *sechdrs,
>mod->arch.core.plt_shndx = i;
>else if (!strcmp(secstrings + sechdrs[i].sh_name, ".init.plt"))
>mod->arch.init.plt_shndx = i;
> -   else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE) &&
> -!strcmp(secstrings + sechdrs[i].sh_name,
> +   else if (!strcmp(secstrings + sechdrs[i].sh_name,
> ".text.ftrace_trampoline"))
>tramp = sechdrs + i;
>else if (sechdrs[i].sh_type == SHT_SYMTAB)
> 
> If so I'll turn it into a formal patch and we can get that merged in the next 
> -rc.

Acked-by: Will Deacon 

Will

Re: [PATCH v2] Documentation: add minimum clang/llvm version

2020-08-28 Thread Will Deacon

On Wed, Aug 26, 2020 at 12:15:55PM -0700, Nick Desaulniers wrote:
> Based on a vote at the LLVM BoF at Plumbers 2020, we decided to start
> small, supporting just one formal upstream release of LLVM for now.
> 
> We can probably widen the support window of supported versions over
> time.  Also, note that LLVM's release process is different than GCC's.
> GCC tends to have 1 major release per year while releasing minor updates
> to the past 3 major versions.  LLVM tends to support one major release
> and one minor release every six months.
> 
> Reviewed-by: Kees Cook 
> Reviewed-by: Nathan Chancellor 
> Tested-by: Gustavo A. R. Silva 
> Tested-by: Nathan Chancellor 
> Signed-off-by: Nick Desaulniers 
> ---
> Changes V1 -> V2:
> * fix typo, as per Will.
> * add link to getting LLVM, as per Nathan.
> * collect tags.

Acked-by: Will Deacon 

Will

Re: [NAK] Re: [PATCH] fs: Optimized fget to improve performance

2020-08-28 Thread Will Deacon

On Thu, Aug 27, 2020 at 03:28:48PM +0100, Al Viro wrote:
> On Thu, Aug 27, 2020 at 06:19:44PM +0800, Shaokun Zhang wrote:
> > From: Yuqi Jin 
> > 
> > It is well known that the performance of atomic_add is better than that of
> > atomic_cmpxchg.
> > The initial value of @f_count is 1. While @f_count is increased by 1 in
> > __fget_files, it will go through three phases: > 0, < 0, and = 0. When the
> > fixed value 0 is used as the condition for terminating the increase of 1,
> > only atomic_cmpxchg can be used. When we use < 0 as the condition for
> > stopping plus 1, we can use atomic_add to obtain better performance.
> 
> Suppose another thread has just removed it from the descriptor table.
> 
> > +static inline bool get_file_unless_negative(atomic_long_t *v, long a)
> > +{
> > +   long c = atomic_long_read(v);
> > +
> > +   if (c <= 0)
> > +   return 0;
> 
> Still 1.  Now the other thread has gotten to dropping the last reference,
> decremented counter to zero and committed to freeing the struct file.
> 
> > +
> > +   return atomic_long_add_return(a, v) - 1;
> 
> ... and you increment that sucker back to 1.  Sure, you return 0, so the
> caller does nothing to that struct file.  Which includes undoing the
> changes to its refecount.
> 
> In the meanwhile, the third thread does fget on the same descriptor,
> and there we end up bumping the refcount to 2 and succeeding.  Which
> leaves the caller with reference to already doomed struct file...
> 
>   IOW, NAK - this is completely broken.  The whole point of
> atomic_long_add_unless() is that the check and conditional increment
> are atomic.  Together.  That's what your optimization takes out.

Cheers Al, yes, this is fscked.

As an aside, I've previously toyed with the idea of implementing a form
of cmpxchg() using a pair of xchg() operations and a smp_cond_load_relaxed(),
where the thing would transition through a "reserved value", which might
perform better with the current trend of building hardware that doesn't
handle CAS failure so well.

But I've never had the time/motivation to hack it up, and it relies on that
reserved value which obviously doesn't always work (so it would have to be a
separate API).

Will

Re: [PATCH] Documentation: add minimum clang/llvm version

2020-08-26 Thread Will Deacon

On Tue, Aug 25, 2020 at 03:25:51PM -0700, Nick Desaulniers wrote:
> Based on a vote at the LLVM BoF at Plumbers 2020, we decided to start
> small, supporting just one formal upstream release of LLVM for now.
> 
> We can probably widen the support window of supported versions over
> time.  Also, note that LLVM's release process is different than GCC's.
> GCC tends to have 1 major release per year while releasing minor updates
> to the past 3 major versions.  LLVM tends to support one major release
> and one minor release every six months.
> 
> Signed-off-by: Nick Desaulniers 
> ---
> Note to reviewers: working remote, I'm having trouble testing/verifying
> that I have the RST links wired up correctly; I would appreciate it if
> someone is able to `make htmldocs` and check
> Documentation/output/process/changes.html properly links to
> Documentation/output/kbuild/llvm.html.
> 
>  Documentation/kbuild/llvm.rst |  2 ++
>  Documentation/process/changes.rst | 10 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/Documentation/kbuild/llvm.rst b/Documentation/kbuild/llvm.rst
> index 2aac50b97921..70ec6e9a183b 100644
> --- a/Documentation/kbuild/llvm.rst
> +++ b/Documentation/kbuild/llvm.rst
> @@ -1,3 +1,5 @@
> +.. _kbuild_llvm:
> +
>  ==
>  Building Linux with Clang/LLVM
>  ==
> diff --git a/Documentation/process/changes.rst 
> b/Documentation/process/changes.rst
> index ee741763a3fc..6c580ef9f2a3 100644
> --- a/Documentation/process/changes.rst
> +++ b/Documentation/process/changes.rst
> @@ -30,6 +30,7 @@ you probably needn't concern yourself with pcmciautils.
>  ProgramMinimal version   Command to check the version
>  == ===  
> 
>  GNU C  4.9  gcc --version
> +Clang/LLVM (optional)  10.0.1   clang --version
>  GNU make   3.81 make --version
>  binutils   2.23 ld -v
>  flex   2.5.35   flex --version
> @@ -68,6 +69,15 @@ GCC
>  The gcc version requirements may vary depending on the type of CPU in your
>  computer.
>  
> +Clang/LLVM (optional)
> +-
> +
> +The latest formal release of clang and LLVM utils (according to
> +`releases.llvm.org `_) are supported for building
> +kernels. Older releases aren't gauranteed to work, and we may drop 
> workarounds

typo: guaranteed

Will

Re: [PATCH stable v4.9 v2] arm64: entry: Place an SB sequence following an ERET instruction

2020-08-24 Thread Will Deacon

Hi Florian,

On Fri, Aug 21, 2020 at 10:16:23AM -0700, Florian Fainelli wrote:
> On 8/21/20 9:03 AM, Will Deacon wrote:
> > On Fri, Aug 07, 2020 at 03:14:29PM +0200, Greg KH wrote:
> >> On Thu, Aug 06, 2020 at 01:00:54PM -0700, Florian Fainelli wrote:
> >>> Greg, did you have a chance to queue those changes for 4.9, 4.14 and 4.19?
> >>>
> >>> https://lore.kernel.org/linux-arm-kernel/20200720182538.13304-1-f.faine...@gmail.com/
> >>> https://lore.kernel.org/linux-arm-kernel/20200720182937.14099-1-f.faine...@gmail.com/
> >>> https://lore.kernel.org/linux-arm-kernel/20200709195034.15185-1-f.faine...@gmail.com/
> >>
> >> Nope, I was waiting for Will's "ack" for these.
> > 
> > This patch doesn't even build for me (the 'sb' macro is not defined in 4.9),
> > and I really wonder why we bother backporting it at all. Nobody's ever shown
> > it to be a problem in practice, and it's clear that this is just being
> > submitted to tick a box rather than anything else (otherwise it would build,
> > right?).
> 
> Doh, I completely missed submitting the patch this depended on that's
> why I did not notice the build failure locally, sorry about that, what a
> shame.
> 
> Would not be the same "tick a box" argument be used against your
> original submission then? Sure, I have not been able to demonstrate in
> real life this was a problem, however the same can be said about a lot
> security related fixes.

Sort of, although I wrote the original patch because it was dead easy to do
and saved having to think too much about the problem, whereas the complexity
of backporting largerly diminishes that imo.

> What if it becomes exploitable in the future, would not it be nice to
> have it in a 6 year LTS kernel?

Even if people are stuck on an old LTS, they should still be taking the
regular updates for it, and we would obviously need to backport the fix if
it turned out to be exploitable (and hey, we could even test it then!).

> > So I'm not going to Ack any of them. As with a lot of this side-channel
> > stuff the cure is far worse than the disease.
> Assuming that my v3 does build correctly, which it will, would you be
> keen on changing your position?

Note that I'm not trying to block this patch from going in, I'm just saying
that I'm not supportive of it. Perhaps somebody from Arm can review it if
they think it's worth the effort.

Will

[PATCH stable-5.4.y backport 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

2020-08-24 Thread Will Deacon

commit b5331379bc62611d1026173a09c73573384201d9 upstream.

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc:  # v5.4 only
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-3-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 virt/kvm/arm/mmu.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b4a6bbadd144..7501ec8a4600 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -332,7 +332,8 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void __unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size,
+bool may_block)
 {
pgd_t *pgd;
phys_addr_t addr = start, end = start + size;
@@ -357,11 +358,16 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (next != end)
+   if (may_block && next != end)
cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+   __unmap_stage2_range(kvm, start, size, true);
+}
+
 static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
  phys_addr_t addr, phys_addr_t end)
 {
@@ -2045,7 +2051,10 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   unmap_stage2_range(kvm, gpa, size);
+   unsigned flags = *(unsigned *)data;
+   bool may_block = flags & MMU_NOTIFIER_RANGE_BLOCKABLE;
+
+   __unmap_stage2_range(kvm, gpa, size, may_block);
return 0;
 }
 
@@ -2056,7 +2065,7 @@ int kvm_unmap_hva_range(struct kvm *kvm,
return 0;
 
trace_kvm_unmap_hva_range(start, end);
-   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
+   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, );
return 0;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-5.7.y backport 1/2] KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()

2020-08-24 Thread Will Deacon

commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:  # v5.7 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-2-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 3 ++-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu/mmu.c  | 3 ++-
 virt/kvm/arm/mmu.c  | 2 +-
 virt/kvm/kvm_main.c | 3 ++-
 10 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 26fca93cd697..397e20a35975 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -440,7 +440,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index caa2b936125c..8861e9d4eb1f 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -939,7 +939,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 7dad7a293eae..2514e51d908b 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -518,7 +518,8 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gfn_t 
gfn, gfn_t gfn_end,
return 1;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1dc63101ffe1..b82e46ecd7fb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -58,7 +58,8 @@
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
 extern int kvm_unmap_hva_range(struct kvm *kvm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end,
+  unsigned flags);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5690a1f9b976..13f107dff880 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -837,7 +837,8 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
kvm->arch.kvm_ops->commit_memory_region(kvm, mem, old, new, change);
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
return kvm->arch.kvm_ops->unmap_hva_range(kvm, start, end);
 }
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index df9989cf7ba3..9b402c345154 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -734,7 +734,8 @@ static int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
return 0;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsign

[PATCH stable-5.8.y backport 1/2] KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()

2020-08-24 Thread Will Deacon

commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:  # v5.8 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-2-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/arm64/kvm/mmu.c| 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 3 ++-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu/mmu.c  | 3 ++-
 virt/kvm/kvm_main.c | 3 ++-
 10 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e21d4a01372f..759d62343e1d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -443,7 +443,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31058e6e7c2a..5f6b35c33618 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2203,7 +2203,7 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t 
gpa, u64 size, void *dat
 }
 
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end)
+   unsigned long start, unsigned long end, unsigned flags)
 {
if (!kvm->arch.pgd)
return 0;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 363e7a89d173..ef1d25d49ec8 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -981,7 +981,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 49bd160f4d85..0783ac9b3240 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -518,7 +518,8 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gfn_t 
gfn, gfn_t gfn_end,
return 1;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7e2d061d0445..bccf0ba2da2e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -58,7 +58,8 @@
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
 extern int kvm_unmap_hva_range(struct kvm *kvm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end,
+  unsigned flags);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 41fedec69ac3..49db50d1db04 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -834,7 +834,8 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
kvm->arch.kvm_ops->commit_memory_region(kvm, mem, old, new, change);
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, uns

[PATCH stable-5.7.y backport 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

2020-08-24 Thread Will Deacon

commit b5331379bc62611d1026173a09c73573384201d9 upstream.

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc:  # v5.7 only
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-3-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 virt/kvm/arm/mmu.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 9510965789e3..b005685a6de4 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -331,7 +331,8 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void __unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size,
+bool may_block)
 {
pgd_t *pgd;
phys_addr_t addr = start, end = start + size;
@@ -356,11 +357,16 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (next != end)
+   if (may_block && next != end)
cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+   __unmap_stage2_range(kvm, start, size, true);
+}
+
 static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
  phys_addr_t addr, phys_addr_t end)
 {
@@ -2041,7 +2047,10 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   unmap_stage2_range(kvm, gpa, size);
+   unsigned flags = *(unsigned *)data;
+   bool may_block = flags & MMU_NOTIFIER_RANGE_BLOCKABLE;
+
+   __unmap_stage2_range(kvm, gpa, size, may_block);
return 0;
 }
 
@@ -2052,7 +2061,7 @@ int kvm_unmap_hva_range(struct kvm *kvm,
return 0;
 
trace_kvm_unmap_hva_range(start, end);
-   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
+   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, );
return 0;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-5.8.y backport 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

2020-08-24 Thread Will Deacon

commit b5331379bc62611d1026173a09c73573384201d9 upstream.

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc:  # v5.8 only
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-3-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/mmu.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5f6b35c33618..bd47f06739d6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -365,7 +365,8 @@ static void unmap_stage2_p4ds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void __unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size,
+bool may_block)
 {
pgd_t *pgd;
phys_addr_t addr = start, end = start + size;
@@ -390,11 +391,16 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (next != end)
+   if (may_block && next != end)
cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+   __unmap_stage2_range(kvm, start, size, true);
+}
+
 static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
  phys_addr_t addr, phys_addr_t end)
 {
@@ -2198,7 +2204,10 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   unmap_stage2_range(kvm, gpa, size);
+   unsigned flags = *(unsigned *)data;
+   bool may_block = flags & MMU_NOTIFIER_RANGE_BLOCKABLE;
+
+   __unmap_stage2_range(kvm, gpa, size, may_block);
return 0;
 }
 
@@ -2209,7 +2218,7 @@ int kvm_unmap_hva_range(struct kvm *kvm,
return 0;
 
trace_kvm_unmap_hva_range(start, end);
-   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
+   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, );
return 0;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-5.4.y backport 1/2] KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()

2020-08-24 Thread Will Deacon

commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:  # v5.4 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-2-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
Signed-off-by: Will Deacon 
---
 arch/arm/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 3 ++-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu.c  | 3 ++-
 virt/kvm/arm/mmu.c  | 2 +-
 virt/kvm/kvm_main.c | 3 ++-
 11 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1b179b1f46bc..dd03d5e01a94 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -266,7 +266,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 0c3bd6aff6e9..d719c6b4dd81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -427,7 +427,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 7b47a323dc23..356c61074d13 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -939,7 +939,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 97e538a8c1be..97f63a84aa51 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -512,7 +512,8 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gfn_t 
gfn, gfn_t gfn_end,
return 1;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 6fe6ad64cba5..740b52ec3509 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -58,7 +58,8 @@
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
 extern int kvm_unmap_hva_range(struct kvm *kvm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end,
+  unsigned flags);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index ec2547cc5ecb..1ff971f3b06f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -867,7 +867,8 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
kvm->arch.kvm_ops->commit_memory_region(kvm, mem, old, new, change

[PATCH stable-4.19.y backport 1/2] KVM: Pass MMU notifier 'blockable' flag to kvm_unmap_hva_range()

2020-08-24 Thread Will Deacon

commit fdfe7cbd58806522e799e2a50a15aee7f2cbb7b6 upstream.

The 'blockable' flag passed to the 'invalidate_range_start()' call back
of 'struct mmu_notifier' is used to indicate whether the function is
permitted to block. In the case of
kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded
on to the architecture-specific implementation of kvm_unmap_hva_range()
and therefore the backend cannot sensibly decide whether or not to
block.

Add an extra 'blockable' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc:  # v4.19 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-2-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
[will: Backport to 4.19; use 'blockable' instead of non-existent range flags]
Signed-off-by: Will Deacon 
---
 arch/arm/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 3 ++-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu.c  | 3 ++-
 virt/kvm/arm/mmu.c  | 2 +-
 virt/kvm/kvm_main.c | 2 +-
 11 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c9128bb187f9..471859cbfe0b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -234,7 +234,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, bool blockable);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e9afdfcb8403..5e720742d647 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -370,7 +370,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, bool blockable);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 2b3fdfc9e0e7..c254761cb8ad 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -936,7 +936,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, bool blockable);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index d8dcdb350405..098a7afd4d38 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -512,7 +512,8 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gfn_t 
gfn, gfn_t gfn_end,
return 1;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   bool blockable)
 {
handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2f95e38f0549..7b54d8412367 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -68,7 +68,8 @@
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
 extern int kvm_unmap_hva_range(struct kvm *kvm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end,
+  bool blockable);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index cc05f346e042..bc9d1321dc73 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -812,7 +812,8 @

[PATCH stable-4.19.y backport 2/2] KVM: arm64: Only reschedule if 'blockable'

2020-08-24 Thread Will Deacon

commit b5331379bc62611d1026173a09c73573384201d9 upstream.

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'blockable' argument to kvm_unmap_hva_range() to ensure that
we only reschedule if blocking is permitted by the MMU notifier call.

Cc:  # v4.19 only
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
Message-Id: <20200811102725.7121-3-w...@kernel.org>
Signed-off-by: Paolo Bonzini 
[will: Backport to 4.19; use 'blockable' instead of non-existent 
MMU_NOTIFIER_RANGE_BLOCKABLE flag]
Signed-off-by: Will Deacon 
---
 virt/kvm/arm/mmu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 3957ff0ecda5..41d6285c3da9 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -323,7 +323,8 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void __unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size,
+bool may_block)
 {
pgd_t *pgd;
phys_addr_t addr = start, end = start + size;
@@ -348,11 +349,16 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (next != end)
+   if (may_block && next != end)
cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+   __unmap_stage2_range(kvm, start, size, true);
+}
+
 static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
  phys_addr_t addr, phys_addr_t end)
 {
@@ -1820,7 +1826,9 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   unmap_stage2_range(kvm, gpa, size);
+   bool may_block = *(bool *)data;
+
+   __unmap_stage2_range(kvm, gpa, size, may_block);
return 0;
 }
 
@@ -1831,7 +1839,7 @@ int kvm_unmap_hva_range(struct kvm *kvm,
return 0;
 
trace_kvm_unmap_hva_range(start, end);
-   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
+   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, );
return 0;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-4.9.y backport] KVM: arm/arm64: Don't reschedule in unmap_stage2_range()

2020-08-24 Thread Will Deacon

Upstream commits fdfe7cbd5880 ("KVM: Pass MMU notifier range flags to
kvm_unmap_hva_range()") and b5331379bc62 ("KVM: arm64: Only reschedule
if MMU_NOTIFIER_RANGE_BLOCKABLE is not set") fix a "sleeping from invalid
context" BUG caused by unmap_stage2_range() attempting to reschedule when
called on the OOM path.

Unfortunately, these patches rely on the MMU notifier callback being
passed knowledge about whether or not blocking is permitted, which was
introduced in 4.19. Rather than backport this considerable amount of
infrastructure just for KVM on arm, instead just remove the conditional
reschedule.

Cc:  # v4.9 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---
 arch/arm/kvm/mmu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index bb0d5e21d60b..b5ce1e81f945 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -298,12 +298,6 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
unmap_stage2_puds(kvm, pgd, addr, next);
-   /*
-* If the range is too large, release the kvm->mmu_lock
-* to prevent starvation and lockup detector warnings.
-*/
-   if (next != end)
-   cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-4.14.y backport] KVM: arm/arm64: Don't reschedule in unmap_stage2_range()

2020-08-24 Thread Will Deacon

Upstream commits fdfe7cbd5880 ("KVM: Pass MMU notifier range flags to
kvm_unmap_hva_range()") and b5331379bc62 ("KVM: arm64: Only reschedule
if MMU_NOTIFIER_RANGE_BLOCKABLE is not set") fix a "sleeping from invalid
context" BUG caused by unmap_stage2_range() attempting to reschedule when
called on the OOM path.

Unfortunately, these patches rely on the MMU notifier callback being
passed knowledge about whether or not blocking is permitted, which was
introduced in 4.19. Rather than backport this considerable amount of
infrastructure just for KVM on arm, instead just remove the conditional
reschedule.

Cc:  # v4.14 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---
 virt/kvm/arm/mmu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 3814cdad643a..7fe673248e98 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -307,12 +307,6 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
unmap_stage2_puds(kvm, pgd, addr, next);
-   /*
-* If the range is too large, release the kvm->mmu_lock
-* to prevent starvation and lockup detector warnings.
-*/
-   if (next != end)
-   cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

[PATCH stable-4.4.y backport] KVM: arm/arm64: Don't reschedule in unmap_stage2_range()

2020-08-24 Thread Will Deacon

Upstream commits fdfe7cbd5880 ("KVM: Pass MMU notifier range flags to
kvm_unmap_hva_range()") and b5331379bc62 ("KVM: arm64: Only reschedule
if MMU_NOTIFIER_RANGE_BLOCKABLE is not set") fix a "sleeping from invalid
context" BUG caused by unmap_stage2_range() attempting to reschedule when
called on the OOM path.

Unfortunately, these patches rely on the MMU notifier callback being
passed knowledge about whether or not blocking is permitted, which was
introduced in 4.19. Rather than backport this considerable amount of
infrastructure just for KVM on arm, instead just remove the conditional
reschedule.

Cc:  # v4.4 only
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---
 arch/arm/kvm/mmu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e0267532bd4e..edd392fdc14b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -300,14 +300,6 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
next = kvm_pgd_addr_end(addr, end);
if (!pgd_none(*pgd))
unmap_puds(kvm, pgd, addr, next);
-   /*
-* If we are dealing with a large range in
-* stage2 table, release the kvm->mmu_lock
-* to prevent starvation and lockup detector
-* warnings.
-*/
-   if (kvm && (next != end))
-   cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog

Re: [PATCH] MAINTAINERS: update QUALCOMM IOMMU after Arm SSMU drivers move

2020-08-21 Thread Will Deacon

On Sun, Aug 02, 2020 at 08:53:20AM +0200, Lukas Bulwahn wrote:
> Commit e86d1aa8b60f ("iommu/arm-smmu: Move Arm SMMU drivers into their own
> subdirectory") moved drivers/iommu/qcom_iommu.c to
> drivers/iommu/arm/arm-smmu/qcom_iommu.c amongst other moves, adjusted some
> sections in MAINTAINERS, but missed adjusting the QUALCOMM IOMMU section.
> 
> Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
> 
>   warning: no file matchesF:drivers/iommu/qcom_iommu.c
> 
> Update the file entry in MAINTAINERS to the new location.
> 
> Signed-off-by: Lukas Bulwahn 
> ---
> Will, please ack.

Typo in subject: s/SSMU/SMMU/

With that:

Acked-by: Will Deacon 

> Joerg, please pick this minor non-urgent patch for your -next branch.

Joerg -- can you queue this as a fix for 5.9-rc, please?

Thanks,

Will

Re: [PATCH stable v4.9 v2] arm64: entry: Place an SB sequence following an ERET instruction

2020-08-21 Thread Will Deacon

On Fri, Aug 07, 2020 at 03:14:29PM +0200, Greg KH wrote:
> On Thu, Aug 06, 2020 at 01:00:54PM -0700, Florian Fainelli wrote:
> > 
> > 
> > On 7/20/2020 11:26 AM, Florian Fainelli wrote:
> > > On 7/20/20 6:04 AM, Greg KH wrote:
> > >> On Thu, Jul 09, 2020 at 12:50:23PM -0700, Florian Fainelli wrote:
> > >>> From: Will Deacon 
> > >>>
> > >>> commit 679db70801da9fda91d26caf13bf5b5ccc74e8e8 upstream
> > >>>
> > >>> Some CPUs can speculate past an ERET instruction and potentially perform
> > >>> speculative accesses to memory before processing the exception return.
> > >>> Since the register state is often controlled by a lower privilege level
> > >>> at the point of an ERET, this could potentially be used as part of a
> > >>> side-channel attack.
> > >>>
> > >>> This patch emits an SB sequence after each ERET so that speculation is
> > >>> held up on exception return.
> > >>>
> > >>> Signed-off-by: Will Deacon 
> > >>> [florian: Adjust hyp-entry.S to account for the label
> > >>>  added change to hyp/entry.S]
> > >>> Signed-off-by: Florian Fainelli 
> > >>> ---
> > >>> Changes in v2:
> > >>>
> > >>> - added missing hunk in hyp/entry.S per Will's feedback
> > >>
> > >> What about 4.19.y and 4.14.y trees?  I can't take something for 4.9.y
> > >> and then have a regression if someone moves to a newer release, right?
> > > 
> > > Sure, send you candidates for 4.14 and 4.19.
> > 
> > Greg, did you have a chance to queue those changes for 4.9, 4.14 and 4.19?
> > 
> > https://lore.kernel.org/linux-arm-kernel/20200720182538.13304-1-f.faine...@gmail.com/
> > https://lore.kernel.org/linux-arm-kernel/20200720182937.14099-1-f.faine...@gmail.com/
> > https://lore.kernel.org/linux-arm-kernel/20200709195034.15185-1-f.faine...@gmail.com/
> 
> Nope, I was waiting for Will's "ack" for these.

This patch doesn't even build for me (the 'sb' macro is not defined in 4.9),
and I really wonder why we bother backporting it at all. Nobody's ever shown
it to be a problem in practice, and it's clear that this is just being
submitted to tick a box rather than anything else (otherwise it would build,
right?).

So I'm not going to Ack any of them. As with a lot of this side-channel
stuff the cure is far worse than the disease.

Will

Re: [PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode

2020-08-21 Thread Will Deacon

On Wed, Jun 24, 2020 at 04:32:28PM +0800, Shaokun Zhang wrote:
> get_file_rcu_many, which is called by __fget_files, has used
> atomic_try_cmpxchg now and it can reduce the access number of the global
> variable to improve the performance of atomic instruction compared with
> atomic_cmpxchg. 
> 
> __fget_files does check the @f_mode with mask variable and will do some
> atomic operations on @f_count, but both are on the same cacheline.
> Many CPU cores do file access and it will cause much conflicts on @f_count. 
> If we could make the two members into different cachelines, it shall relax
> the siutations.
> 
> We have tested this on ARM64 and X86, the result is as follows:
> Syscall of unixbench has been run on Huawei Kunpeng920 with this patch:
> 24 x System Call Overhead  1
> 
> System Call Overhead3160841.4 lps   (10.0 s, 1 samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.03160841.4   2107.2
>
> System Benchmarks Index Score (Partial Only) 2107.2
> 
> Without this patch:
> 24 x System Call Overhead  1
> 
> System Call Overhead456.0 lps   (10.0 s, 1 samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.0456.0   1481.6
>
> System Benchmarks Index Score (Partial Only) 1481.6
> 
> And on Intel 6248 platform with this patch:
> 40 CPUs in system; running 24 parallel copies of tests
> 
> System Call Overhead4288509.1 lps   (10.0 s, 1 
> samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.04288509.1   2859.0
>
> System Benchmarks Index Score (Partial Only) 2859.0
> 
> Without this patch:
> 40 CPUs in system; running 24 parallel copies of tests
> 
> System Call Overhead3666313.0 lps   (10.0 s, 1 
> samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.03666313.0   2444.2
>    
> System Benchmarks Index Score (Partial Only) 2444.2
> 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: Peter Zijlstra 
> Cc: Alexander Viro 
> Cc: Boqun Feng 
> Signed-off-by: Yuqi Jin 
> Signed-off-by: Shaokun Zhang 
> ---
>  include/linux/fs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3f881a892ea7..0faeab5622fb 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -955,7 +955,6 @@ struct file {
>*/
>   spinlock_t  f_lock;
>   enum rw_hintf_write_hint;
> - atomic_long_t   f_count;
>   unsigned intf_flags;
>   fmode_t f_mode;
>   struct mutexf_pos_lock;
> @@ -979,6 +978,7 @@ struct file {
>   struct address_space*f_mapping;
>   errseq_tf_wb_err;
>   errseq_tf_sb_err; /* for syncfs */
> + atomic_long_t   f_count;
>  } __randomize_layout
>__attribute__((aligned(4)));   /* lest something weird decides that 2 
> is OK */

Hmm. So the microbenchmark numbers look lovely, but:

  - What impact does it actually have for real workloads?
  - How do we avoid regressing performance by innocently changing the struct
again later on?
  - This thing is tagged with __randomize_layout, so it doesn't help anybody
using that crazy plugin
  - What about all the other atomics and locks that share cachelines?

Will

Re: [PATCH v4 00/17] Warn on orphan section placement

2020-08-21 Thread Will Deacon

Hi Kees,

On Sun, Jun 28, 2020 at 11:18:23PM -0700, Kees Cook wrote:
> v4:
> - explicitly add .ARM.attributes
> - split up arm64 changes into separate patches
> - split up arm changes into separate patches
> - work around Clang section generation bug in -mbranch-protection
> - work around Clang section generation bug in KASAN and KCSAN
> - split "common" ELF sections out of STABS_DEBUG
> - changed relative position of .comment
> - add reviews/acks

What's the plan with this series? I thought it might have landed during the
merge window, but I can't even seem to find it in next. Anything else you
need on the arm64 side?

Will

Re: [PATCH v2] module: Harden STRICT_MODULE_RWX

2020-08-21 Thread Will Deacon

On Fri, Aug 21, 2020 at 02:27:05PM +0200, Ard Biesheuvel wrote:
> On Fri, 21 Aug 2020 at 14:20, Will Deacon  wrote:
> >
> > On Thu, Aug 13, 2020 at 03:07:13PM +0200, Ard Biesheuvel wrote:
> > > On Thu, 13 Aug 2020 at 15:04, Jessica Yu  wrote:
> > > >
> > > > +++ Ard Biesheuvel [13/08/20 10:36 +0200]:
> > > > >On Wed, 12 Aug 2020 at 22:00, Peter Zijlstra  
> > > > >wrote:
> > > > >>
> > > > >> On Wed, Aug 12, 2020 at 06:37:57PM +0200, Ard Biesheuvel wrote:
> > > > >> > I know there is little we can do at this point, apart from ignoring
> > > > >> > the permissions - perhaps we should just defer the w^x check until
> > > > >> > after calling module_frob_arch_sections()?
> > > > >>
> > > > >> My earlier suggestion was to ignore it for 0-sized sections.
> > > > >
> > > > >Only they are 1 byte sections in this case.
> > > > >
> > > > >We override the sh_type and sh_flags explicitly for these sections at
> > > > >module load time, so deferring the check seems like a reasonable
> > > > >alternative to me.
> > > >
> > > > So module_enforce_rwx_sections() is already called after
> > > > module_frob_arch_sections() - which really baffled me at first, since
> > > > sh_type and sh_flags should have been set already in
> > > > module_frob_arch_sections().
> > > >
> > > > I added some debug prints to see which section the module code was
> > > > tripping on, and it was .text.ftrace_trampoline. See this snippet from
> > > > arm64's module_frob_arch_sections():
> > > >
> > > > else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE) &&
> > > >  !strcmp(secstrings + sechdrs[i].sh_name,
> > > >  ".text.ftrace_trampoline"))
> > > > tramp = sechdrs + i;
> > > >
> > > > Since Mauro's config doesn't have CONFIG_DYNAMIC_FTRACE enabled, tramp
> > > > is never set here and the if (tramp) check at the end of the function
> > > > fails, so its section flags are never set, so they remain WAX and fail
> > > > the rwx check.
> > >
> > > Right. Our module.lds does not go through the preprocessor, so we
> > > cannot add the #ifdef check there currently. So we should either drop
> > > the IS_ENABLED() check here, or simply rename the section, dropping
> > > the .text prefix (which doesn't seem to have any significance outside
> > > this context)
> > >
> > > I'll leave it to Will to make the final call here.
> >
> > Why don't we just preprocess the linker script, like we do for the main
> > kernel?
> >
> 
> That should work as well, I just haven't checked how straight-forward
> it is to change that.

Ok, if it's _not_ straightforward, then let's just drop the IS_ENABLED()
altogether.

Will

Re: [PATCH v2] module: Harden STRICT_MODULE_RWX

2020-08-21 Thread Will Deacon

On Thu, Aug 13, 2020 at 03:07:13PM +0200, Ard Biesheuvel wrote:
> On Thu, 13 Aug 2020 at 15:04, Jessica Yu  wrote:
> >
> > +++ Ard Biesheuvel [13/08/20 10:36 +0200]:
> > >On Wed, 12 Aug 2020 at 22:00, Peter Zijlstra  wrote:
> > >>
> > >> On Wed, Aug 12, 2020 at 06:37:57PM +0200, Ard Biesheuvel wrote:
> > >> > I know there is little we can do at this point, apart from ignoring
> > >> > the permissions - perhaps we should just defer the w^x check until
> > >> > after calling module_frob_arch_sections()?
> > >>
> > >> My earlier suggestion was to ignore it for 0-sized sections.
> > >
> > >Only they are 1 byte sections in this case.
> > >
> > >We override the sh_type and sh_flags explicitly for these sections at
> > >module load time, so deferring the check seems like a reasonable
> > >alternative to me.
> >
> > So module_enforce_rwx_sections() is already called after
> > module_frob_arch_sections() - which really baffled me at first, since
> > sh_type and sh_flags should have been set already in
> > module_frob_arch_sections().
> >
> > I added some debug prints to see which section the module code was
> > tripping on, and it was .text.ftrace_trampoline. See this snippet from
> > arm64's module_frob_arch_sections():
> >
> > else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE) &&
> >  !strcmp(secstrings + sechdrs[i].sh_name,
> >  ".text.ftrace_trampoline"))
> > tramp = sechdrs + i;
> >
> > Since Mauro's config doesn't have CONFIG_DYNAMIC_FTRACE enabled, tramp
> > is never set here and the if (tramp) check at the end of the function
> > fails, so its section flags are never set, so they remain WAX and fail
> > the rwx check.
> 
> Right. Our module.lds does not go through the preprocessor, so we
> cannot add the #ifdef check there currently. So we should either drop
> the IS_ENABLED() check here, or simply rename the section, dropping
> the .text prefix (which doesn't seem to have any significance outside
> this context)
> 
> I'll leave it to Will to make the final call here.

Why don't we just preprocess the linker script, like we do for the main
kernel?

Will

Re: [PATCH] Documentation/locking/locktypes: fix local_locks documentation

2020-08-21 Thread Will Deacon

On Sun, Jul 26, 2020 at 08:54:40PM +0200, Marta Rybczynska wrote:
> Fix issues with local_locks documentation:
> - fix function names, local_lock.h has local_unlock_irqrestore(),
> not local_lock_irqrestore()
> - fix mapping table, local_unlock_irqrestore() maps to local_irq_restore(),
> not _save()
> 
> Signed-off-by: Marta Rybczynska 
> ---
>  Documentation/locking/locktypes.rst | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)

Looks correct to me, thanks:

Acked-by: Will Deacon 

Will

Re: [PATCH] arm64/module-plts: Consider the special case where plt_max_entries is 0

2020-08-21 Thread Will Deacon

On Tue, Jul 14, 2020 at 08:48:11AM +, 彭浩(Richard) wrote:
> On Thu, Jul 09, 2020 at 07:18:01AM +,Peng Hao(Richard) wrote:
> > On Thu, 9 Jul 2020 at 09:50, Peng Hao(Richard)  
> > wrote:
> > >> >Apparently, you are hitting a R_AARCH64_JUMP26 or R_AARCH64_CALL26
> > >> >relocation that operates on a b or bl instruction that is more than
> > >> >128 megabytes away from its target.
> > >> >
> > >> My understanding is that a module that calls functions that are not part 
> > >> of the module will use PLT.
> > >> Plt_max_entries =0 May occur if a module does not depend on other module 
> > >> functions.
> > >>
> > >
> > >A PLT slot is allocated for each b or bl instruction that refers to a
> > >symbol that lives in a different section, either of the same module
> > > (e.g., bl in .init calling into .text), of another module, or of the
> > >core kernel.
> > >
> > >I don't see how you end up with plt_max_entries in this case, though.
> > if a module does not depend on other module functions, PLT entries in the 
> > module is equal to 0.
> 
> >This brings me back to my earlier question: if there are no PLT entries in
> >the module, then count_plts() will not find any R_AARCH64_JUMP26 or
> >R_AARCH64_CALL26 relocations that require PLTs and will therefore return 0.
> >The absence of these relocations means that module_emit_plt_entry() will not
> >be called by apply_relocate_add(), and so your patch should have no effect.
> 1.The module in question is the calling function from core kernel.( 
> Ib_core.ko triggered the warning multiple times).
> 2. There are multiple threads loading IB_core.ko
> [   73.388931]  ###cpu=33, name=ib_core, core_plts=0, init_plts=0  
> [   73.402102]   cpu=33,pid=2297,name=ib_core, 
> module_emit_plt_entry:plt_num_entries=1, plt_max_entries=0 (warning)
> [   73.439391]  ###cpu=24, name=ib_core, core_plts=0, init_plts=0  
> [   73.448617]  ###cpu=4, name=ib_core, core_plts=0, init_plts=0  
> [   73.547535]  ###cpu=221, name=ib_core, core_plts=0, init_plts=0  
> [   75.198075]   cpu=24,pid=2336,name=ib_core, 
> module_emit_plt_entry:plt_num_entries=1, plt_max_entries=0 (warning)
> [   75.489496]   cpu=4,pid=2344,name=ib_core, 
> module_emit_plt_entry:plt_num_entries=1, plt_max_entries=0(warning)
> 
> I don't understand why count_plts returns 0 when CONFIG_RANDOMIZE_BASE=n for 
> R_AARCH64_JUMP26 and R_AARCH64_CALL26.
> 
> 3. Set CONFIG_ARM64_MODULE_PLTS=y and restart the server several times 
> without triggering this warning.

Can you provide a means for us to reproduce this failure with an upstream
kernel, please? I really can't tell what's going on from the report. If I
can reproduce the problem locally, then I'm happy to take a look.

Thanks,

Will

Re: [PATCH] locking/percpu-rwsem: Remove WQ_FLAG_EXCLUSIVE flags

2020-08-21 Thread Will Deacon

On Wed, Jul 01, 2020 at 01:57:20PM +0800, qiang.zh...@windriver.com wrote:
> From: Zqiang 
> 
> Remove WQ_FLAG_EXCLUSIVE from "wq_entry.flags", using function
> __add_wait_queue_entry_tail_exclusive substitution.
> 
> Signed-off-by: Zqiang 
> ---
>  kernel/locking/percpu-rwsem.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index 8bbafe3e5203..48e1c55c2e59 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -148,8 +148,8 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore 
> *sem, bool reader)
>*/
>   wait = !__percpu_rwsem_trylock(sem, reader);
>   if (wait) {
> - wq_entry.flags |= WQ_FLAG_EXCLUSIVE | reader * WQ_FLAG_CUSTOM;
> - __add_wait_queue_entry_tail(>waiters, _entry);
> + wq_entry.flags |= reader * WQ_FLAG_CUSTOM;
> + __add_wait_queue_entry_tail_exclusive(>waiters, _entry);
>   }
>   spin_unlock_irq(>waiters.lock);

Seems straightforward enough:

Acked-by: Will Deacon 

Will

Re: [PATCH] ARM64: vdso32: Install vdso32 from vdso_install

2020-08-21 Thread Will Deacon

On Mon, Aug 17, 2020 at 06:49:50PM -0700, Stephen Boyd wrote:
> Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
> vdso_install' installs the 32-bit compat vdso when it is compiled.
> 
> Cc: Vincenzo Frascino 
> Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO")
> Signed-off-by: Stephen Boyd 
> ---
>  arch/arm64/Makefile   | 1 +
>  arch/arm64/kernel/vdso32/Makefile | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)

Acked-by: Will Deacon 

Will

Re: [PATCH v6 1/2] dma-contiguous: provide the ability to reserve per-numa CMA

2020-08-21 Thread Will Deacon

On Fri, Aug 21, 2020 at 09:13:39AM +, Song Bao Hua (Barry Song) wrote:
> 
> 
> > -Original Message-
> > From: Will Deacon [mailto:w...@kernel.org]
> > Sent: Friday, August 21, 2020 8:47 PM
> > To: Song Bao Hua (Barry Song) 
> > Cc: h...@lst.de; m.szyprow...@samsung.com; robin.mur...@arm.com;
> > ganapatrao.kulka...@cavium.com; catalin.mari...@arm.com;
> > io...@lists.linux-foundation.org; Linuxarm ;
> > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org;
> > huangdaode ; Jonathan Cameron
> > ; Nicolas Saenz Julienne
> > ; Steve Capper ; Andrew
> > Morton ; Mike Rapoport 
> > Subject: Re: [PATCH v6 1/2] dma-contiguous: provide the ability to reserve
> > per-numa CMA
> > 
> > On Fri, Aug 21, 2020 at 02:26:14PM +1200, Barry Song wrote:
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > > index bdc1f33fd3d1..3f33b89aeab5 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -599,6 +599,15 @@
> > >   altogether. For more information, see
> > >   include/linux/dma-contiguous.h
> > >
> > > + pernuma_cma=nn[MG]
> > > + [ARM64,KNL]
> > > + Sets the size of kernel per-numa memory area for
> > > + contiguous memory allocations. A value of 0 disables
> > > + per-numa CMA altogether. DMA users on node nid will
> > > + first try to allocate buffer from the pernuma area
> > > + which is located in node nid, if the allocation fails,
> > > + they will fallback to the global default memory area.
> > 
> > What is the default behaviour if this option is not specified? Seems like
> > that should be mentioned here.

Just wanted to make sure you didn't miss this ^^

> > 
> > > diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> > > index 847a9d1fa634..db7a37ed35eb 100644
> > > --- a/kernel/dma/Kconfig
> > > +++ b/kernel/dma/Kconfig
> > > @@ -118,6 +118,16 @@ config DMA_CMA
> > > If unsure, say "n".
> > >
> > >  if  DMA_CMA
> > > +
> > > +config DMA_PERNUMA_CMA
> > > + bool "Enable separate DMA Contiguous Memory Area for each NUMA
> > Node"
> > 
> > I don't understand the need for this config option. If you have DMA_DMA and
> > you have NUMA, why wouldn't you want this enabled?
> 
> Christoph preferred this in previous patchset in order to be able to remove 
> all of the code
> in the text if users don't use pernuma CMA.

Ok, I defer to Christoph here, but maybe a "default NUMA" might work?

> > > + help
> > > +   Enable this option to get pernuma CMA areas so that devices like
> > > +   ARM64 SMMU can get local memory by DMA coherent APIs.
> > > +
> > > +   You can set the size of pernuma CMA by specifying
> > "pernuma_cma=size"
> > > +   on the kernel's command line.
> > > +
> > >  comment "Default contiguous memory area size:"
> > >
> > >  config CMA_SIZE_MBYTES
> > > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> > > index cff7e60968b9..89b95f10e56d 100644
> > > --- a/kernel/dma/contiguous.c
> > > +++ b/kernel/dma/contiguous.c
> > > @@ -69,6 +69,19 @@ static int __init early_cma(char *p)
> > >  }
> > >  early_param("cma", early_cma);
> > >
> > > +#ifdef CONFIG_DMA_PERNUMA_CMA
> > > +
> > > +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
> > > +static phys_addr_t pernuma_size_bytes __initdata;
> > > +
> > > +static int __init early_pernuma_cma(char *p)
> > > +{
> > > + pernuma_size_bytes = memparse(p, );
> > > + return 0;
> > > +}
> > > +early_param("pernuma_cma", early_pernuma_cma);
> > > +#endif
> > > +
> > >  #ifdef CONFIG_CMA_SIZE_PERCENTAGE
> > >
> > >  static phys_addr_t __init __maybe_unused
> > cma_early_percent_memory(void)
> > > @@ -96,6 +109,34 @@ static inline __maybe_unused phys_addr_t
> > cma_early_percent_memory(void)
> > >
> > >  #endif
> > >
> > > +#ifdef CONFIG_DMA_PERNUMA_CMA
> > > +void __init dma_pernuma_cma_reserve(void)
> > > +{
> > > + int nid;
> >

Re: [PATCH v2] MAINTAINERS: Add entries for CoreSight and Arm SPE

2020-08-21 Thread Will Deacon

On Thu, Aug 20, 2020 at 11:55:10AM -0600, Mathieu Poirier wrote:
> Add entries for perf tools elements related to the support of Arm CoreSight
> and Arm SPE.  Also lump in Arm and Arm64 architecture files to provide
> coverage.
> 
> Signed-off-by: Mathieu Poirier 
> ---
> V2:
> - Completed fileset for SPE.
> - Added Arm and Arm64 architecture files.
> 
>  MAINTAINERS | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index deaafb617361..e76f7bb014ce 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13569,12 +13569,18 @@ F:  kernel/events/*
>  F:   tools/lib/perf/
>  F:   tools/perf/
>  
> -PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
> +PERFORMANCE EVENTS SUBSYSTEM ARM64

I'd probably prefer to go with TOOLING instead of SUBSYSTEM, since the
kernel parts are covered by the "ARM PMU PROFILING AND DEBUGGING" entry.

>  R:   John Garry 
>  R:   Will Deacon 
> +R:   Mathieu Poirier 
> +R:   Leo Yan 
>  L:   linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
>  S:   Supported
> +F:   tools/build/feature/test-libopencsd.c
> +F:   tools/perf/arch/arm*/
>  F:   tools/perf/pmu-events/arch/arm64/
> +F:   tools/perf/util/arm-spe*
> +F:   tools/perf/util/cs-etm*

Either way,

Acked-by: Will Deacon 

Thanks,

Will

Re: [PATCH v6 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers

2020-08-21 Thread Will Deacon

On Fri, Aug 21, 2020 at 02:26:15PM +1200, Barry Song wrote:
> Right now, smmu is using dma_alloc_coherent() to get memory to save queues
> and tables. Typically, on ARM64 server, there is a default CMA located at
> node0, which could be far away from node2, node3 etc.
> with this patch, smmu will get memory from local numa node to save command
> queues and page tables. that means dma_unmap latency will be shrunk much.
> Meanwhile, when iommu.passthrough is on, device drivers which call dma_
> alloc_coherent() will also get local memory and avoid the travel between
> numa nodes.
> 
> Cc: Christoph Hellwig 
> Cc: Marek Szyprowski 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Ganapatrao Kulkarni 
> Cc: Catalin Marinas 
> Cc: Nicolas Saenz Julienne 
> Cc: Steve Capper 
> Cc: Andrew Morton 
> Cc: Mike Rapoport 
> Signed-off-by: Barry Song 
> ---
>  -v6: rebase on top of 5.9-rc1
> 
>  arch/arm64/mm/init.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 481d22c32a2e..f1c75957ff3c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -429,6 +429,8 @@ void __init bootmem_init(void)
>   arm64_hugetlb_cma_reserve();
>  #endif
>  
> + dma_pernuma_cma_reserve();

I think will have to do for now, but I still wish that more of this was
driven from the core code so that we don't have to worry about
initialisation order and whether things are early/late enough on a per-arch
basis.

Acked-by: Will Deacon 

Will

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 10389 matches

Mail list logo