Re: [PATCH 0/4] arm64: wire up VM_FLUSH_RESET_PERMS

2019-06-24 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 04:29:39PM +0200, Ard Biesheuvel wrote:
> On Mon, 24 Jun 2019 at 13:23, Ard Biesheuvel  wrote:
> > On 6/24/19 1:16 PM, Will Deacon wrote:
> > > On Tue, May 28, 2019 at 11:04:20AM +0100, Will Deacon wrote:
> > >> On Thu, May 23, 2019 at 11:22:52AM +0100, Ard Biesheuvel wrote:
> > >>> Ard Biesheuvel (4):
> > >>>arm64: module: create module allocations without exec permissions
> > >>>arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
> > >>>arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
> > >>>arm64: bpf: do not allocate executable memory
> > >>>
> > >>>   arch/arm64/Kconfig  |  1 +
> > >>>   arch/arm64/include/asm/cacheflush.h |  3 ++
> > >>>   arch/arm64/kernel/module.c  |  4 +-
> > >>>   arch/arm64/kernel/probes/kprobes.c  |  4 +-
> > >>>   arch/arm64/mm/pageattr.c| 48 
> > >>>   arch/arm64/net/bpf_jit_comp.c   |  2 +-
> > >>>   mm/vmalloc.c| 11 -
> > >>>   7 files changed, 50 insertions(+), 23 deletions(-)
> > >>
> > >> Thanks, this all looks good to me. I can get pick this up for 5.2 if
> > >> Rick's fixes [1] land soon enough.
> > >
> > > Bah, I missed these landing in -rc5 and I think it's a bit too late for
> > > us to take this for 5.2. now particularly with our limited ability to
> > > fix any late regressions that might arise.
> > >
> > > In which case, Catalin, please can you take these for 5.3? You might run
> > > into some testing failures with for-next/core due to the late of Rick's
> > > fixes, but linux-next should be alright and I don't think you'll get any
> > > conflicts.
> > >
> > > Acked-by: Will Deacon 
> > >
> > > Ard: are you ok with that?
> >
> > That is fine, although I won't be around to pick up the pieces by the
> > time the merge window opens. Also, I'd like to follow up on the lazy
> > vunmap thing for non-x86, but perhaps we can talk about this at plumbers?
> 
> Actually, you will run into a couple of conflicts. Let me know if you
> want me to respin (although they still won't apply cleanly to both
> for-next/core and -next)

I queued them in for-next/core (and fixed a minor conflict). Thanks.

-- 
Catalin


Re: [PATCH v7 00/25] Unify vDSOs across more architectures

2019-06-24 Thread Catalin Marinas
On Mon, Jun 24, 2019 at 03:23:46PM +0100, Russell King wrote:
> On Mon, Jun 24, 2019 at 04:18:28PM +0200, Thomas Gleixner wrote:
> > Vincenzo,
> > 
> > On Mon, 24 Jun 2019, Thomas Gleixner wrote:
> > 
> > > I did not merge the ARM and MIPS parts as they lack any form of
> > > acknowlegment from their maintainers. Please talk to those folks. If they
> > > ack/review the changes then I can pick them up and they go into 5.3 or 
> > > they
> > > have to go in a later cycle. Nevertheless it was well worth the trouble to
> > > have those conversions done to confirm that the new common library fits a
> > > bunch of different architectures.
> > 
> > I talked to Russell King and he suggested to file the ARM parts into his
> > patch system and he'll pick them up after 5.3-rc1.
> > 
> >https://www.arm.linux.org.uk/developer/patches/
> > 
> > I paged out how to deal with it, but you'll surely manage :)
> 
> Easy way: ask git to add the "KernelVersion" tag as a header to the
> email using --add-header to e.g. git format-patch, and just mail them
> to patc...@armlinux.org.uk

Although I haven't send patches to Russell in a while, I still have a
git alias in my .gitconfig (only works with one patch at a time IIRC,
sending multiple patches may arrive in a different order):

[alias]
send-rmk-email = !git send-email --add-header=\"KernelVersion: $(git 
describe --abbrev=0)\" --no-thread --suppress-cc=all 
--to="patc...@arm.linux.org.uk"

-- 
Catalin


Re: [PATCH v7 10/25] arm64: compat: Add vDSO

2019-06-24 Thread Catalin Marinas
On Fri, Jun 21, 2019 at 10:52:37AM +0100, Vincenzo Frascino wrote:
> --- /dev/null
> +++ b/arch/arm64/include/asm/vdso/compat_barrier.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2018 ARM Limited
> + */
> +#ifndef __COMPAT_BARRIER_H
> +#define __COMPAT_BARRIER_H
> +
> +#ifndef __ASSEMBLY__
> +/*
> + * Warning: This code is meant to be used with
> + * ENABLE_COMPAT_VDSO only.
> + */
> +#ifndef ENABLE_COMPAT_VDSO
> +#error This header is meant to be used with ENABLE_COMPAT_VDSO only
> +#endif
> +
> +#ifdef dmb
> +#undef dmb
> +#endif
> +
> +#if __LINUX_ARM_ARCH__ >= 7
> +#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
> +#elif __LINUX_ARM_ARCH__ == 6
> +#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
> + : : "r" (0) : "memory")
> +#else
> +#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
> +#endif

We don't need pre-ARMv7 barriers (they've been deprecated and the arm64
kernel actually traps and emulates them by default). Also your Makefile
changes never define a __LINUX_ARM_ARCH__ lower than 7. Fix-up below:

--8<---
>From 5655a0313ce7bb731bfed6a19bcfe6b1100b542a Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Mon, 24 Jun 2019 12:16:06 +0100
Subject: [PATCH] arm64: compat: No need for pre-ARMv7 barriers on an ARMv8
 system

This patch removes the deprecated (pre-ARMv7) compat barriers as they
would not be used on an ARMv8 system.

Fixes: a7f71a2c8903 ("arm64: compat: Add vDSO")
Signed-off-by: Catalin Marinas 
---
 arch/arm64/include/asm/vdso/compat_barrier.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/arm64/include/asm/vdso/compat_barrier.h 
b/arch/arm64/include/asm/vdso/compat_barrier.h
index ea24ea856b07..fb60a88b5ed4 100644
--- a/arch/arm64/include/asm/vdso/compat_barrier.h
+++ b/arch/arm64/include/asm/vdso/compat_barrier.h
@@ -18,14 +18,7 @@
 #undef dmb
 #endif
 
-#if __LINUX_ARM_ARCH__ >= 7
 #define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
-#elif __LINUX_ARM_ARCH__ == 6
-#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \
-   : : "r" (0) : "memory")
-#else
-#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
-#endif
 
 #if __LINUX_ARM_ARCH__ >= 8
 #define aarch32_smp_mb()   dmb(ish)


Re: [PATCH v7 04/25] arm64: Substitute gettimeofday with C implementation

2019-06-24 Thread Catalin Marinas
On Fri, Jun 21, 2019 at 10:52:31AM +0100, Vincenzo Frascino wrote:
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 947e39896e28..9e4b7ccbab2f 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -25,13 +25,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  
> @@ -100,17 +100,28 @@ int main(void)
>DEFINE(CLOCK_COARSE_RES,   LOW_RES_NSEC);
>DEFINE(NSEC_PER_SEC,   NSEC_PER_SEC);
>BLANK();
> -  DEFINE(VDSO_CS_CYCLE_LAST, offsetof(struct vdso_data, cs_cycle_last));
> -  DEFINE(VDSO_RAW_TIME_SEC,  offsetof(struct vdso_data, raw_time_sec));
> -  DEFINE(VDSO_XTIME_CLK_SEC, offsetof(struct vdso_data, xtime_clock_sec));
> -  DEFINE(VDSO_XTIME_CRS_SEC, offsetof(struct vdso_data, xtime_coarse_sec));
> -  DEFINE(VDSO_XTIME_CRS_NSEC,offsetof(struct vdso_data, 
> xtime_coarse_nsec));
> -  DEFINE(VDSO_WTM_CLK_SEC,   offsetof(struct vdso_data, wtm_clock_sec));
> -  DEFINE(VDSO_TB_SEQ_COUNT,  offsetof(struct vdso_data, tb_seq_count));
> -  DEFINE(VDSO_CS_MONO_MULT,  offsetof(struct vdso_data, cs_mono_mult));
> -  DEFINE(VDSO_CS_SHIFT,  offsetof(struct vdso_data, cs_shift));
> +  DEFINE(VDSO_SEQ,   offsetof(struct vdso_data, seq));
> +  DEFINE(VDSO_CLK_MODE,  offsetof(struct vdso_data, clock_mode));
> +  DEFINE(VDSO_CYCLE_LAST,offsetof(struct vdso_data, cycle_last));
> +  DEFINE(VDSO_MASK,  offsetof(struct vdso_data, mask));
> +  DEFINE(VDSO_MULT,  offsetof(struct vdso_data, mult));
> +  DEFINE(VDSO_SHIFT, offsetof(struct vdso_data, shift));
> +  DEFINE(VDSO_REALTIME_SEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_REALTIME].sec));
> +  DEFINE(VDSO_REALTIME_NSEC, offsetof(struct vdso_data, 
> basetime[CLOCK_REALTIME].nsec));
> +  DEFINE(VDSO_MONO_SEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC].sec));
> +  DEFINE(VDSO_MONO_NSEC, offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC].nsec));
> +  DEFINE(VDSO_MONO_RAW_SEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC_RAW].sec));
> +  DEFINE(VDSO_MONO_RAW_NSEC, offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC_RAW].nsec));
> +  DEFINE(VDSO_BOOTTIME_SEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_BOOTTIME].sec));
> +  DEFINE(VDSO_BOOTTIME_NSEC, offsetof(struct vdso_data, 
> basetime[CLOCK_BOOTTIME].nsec));
> +  DEFINE(VDSO_TAI_SEC,   offsetof(struct vdso_data, 
> basetime[CLOCK_TAI].sec));
> +  DEFINE(VDSO_TAI_NSEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_TAI].nsec));
> +  DEFINE(VDSO_RT_COARSE_SEC, offsetof(struct vdso_data, 
> basetime[CLOCK_REALTIME_COARSE].sec));
> +  DEFINE(VDSO_RT_COARSE_NSEC,offsetof(struct vdso_data, 
> basetime[CLOCK_REALTIME_COARSE].nsec));
> +  DEFINE(VDSO_MONO_COARSE_SEC,   offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC_COARSE].sec));
> +  DEFINE(VDSO_MONO_COARSE_NSEC,  offsetof(struct vdso_data, 
> basetime[CLOCK_MONOTONIC_COARSE].nsec));
>DEFINE(VDSO_TZ_MINWEST,offsetof(struct vdso_data, tz_minuteswest));
> -  DEFINE(VDSO_USE_SYSCALL,   offsetof(struct vdso_data, use_syscall));
> +  DEFINE(VDSO_TZ_DSTTIME,offsetof(struct vdso_data, tz_dsttime));
>BLANK();
>DEFINE(TVAL_TV_SEC,offsetof(struct timeval, tv_sec));
>DEFINE(TSPEC_TV_SEC,   offsetof(struct timespec, tv_sec));

Now that we are moving this to C, do we actually need the asm-offsets?
If not, here's a clean-up patch:

---8<--
>From 7e818178a8b225b522fe547cf00ba8508d4cdcf0 Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Mon, 24 Jun 2019 14:12:48 +0100
Subject: [PATCH] arm64: vdso: Remove unnecessary asm-offsets.c definitions

Since the VDSO code is moving to C from assembly, there is no need to
define and maintain the corresponding asm offsets.

Fixes: 28b1a824a4f4 ("arm64: vdso: Substitute gettimeofday() with C 
implementation")
Signed-off-by: Catalin Marinas 
---
 arch/arm64/kernel/asm-offsets.c | 39 -
 1 file changed, 39 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index e6f7409a78a4..214685760e1c 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -86,44 +85,6 @@ int main(void)
   BLANK();
   DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET);
   BLANK();
-  DEFINE(CLOCK_REALTIME,   CLOCK_REALTIME);
-  DEFINE(CLOCK_MONOTONIC,  CLOCK_M

Re: [PATCH v7 01/25] kernel: Standardize vdso_datapage

2019-06-24 Thread Catalin Marinas
On Fri, Jun 21, 2019 at 10:52:28AM +0100, Vincenzo Frascino wrote:
> In an effort to unify the common code for managing the vdso library in
> between all the architectures that support it, this patch tries to
> provide a common format for the vdso datapage.
> 
> As a result of this, this patch generalized the data structures in vgtod.h
> from x86 private includes to general includes (include/vdso).
> 
> Cc: Arnd Bergmann 
> Signed-off-by: Vincenzo Frascino 
> Tested-by: Shijith Thotton 
> Tested-by: Andre Przywara 

Minor clean-up patch (on top of the tip timers/vdso branch):

8<--
>From 2e09fa6fca341b3ec7ecaf0b67a313a167bb4ff2 Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Mon, 24 Jun 2019 12:19:23 +0100
Subject: [PATCH] vdso: Remove superfluous #ifdef __KERNEL__ in
 vdso/datapage.h

With the move to UAPI headers, such #ifdefs are no longer necessary.

Fixes: 361f8aee9b09 ("vdso: Define standardized vdso_datapage")
Signed-off-by: Catalin Marinas 
---
 include/vdso/datapage.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index e6eb36c3d54f..2e302c0f41f7 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -2,8 +2,6 @@
 #ifndef __VDSO_DATAPAGE_H
 #define __VDSO_DATAPAGE_H
 
-#ifdef __KERNEL__
-
 #ifndef __ASSEMBLY__
 
 #include 
@@ -88,6 +86,4 @@ extern struct vdso_data _vdso_data[CS_BASES] 
__attribute__((visibility("hidden")
 
 #endif /* !__ASSEMBLY__ */
 
-#endif /* __KERNEL__ */
-
 #endif /* __VDSO_DATAPAGE_H */


Re: [PATCH] arm64: defconfig: update and enable CONFIG_RANDOMIZE_BASE

2019-06-24 Thread Catalin Marinas
On Thu, Jun 20, 2019 at 08:46:58AM +0100, Will Deacon wrote:
> On Wed, Jun 19, 2019 at 05:32:42PM -0700, Nick Desaulniers wrote:
> > Generated via:
> > $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make defconfig
> > $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make menuconfig
> > 
> > $ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make savedefconfig
> > $ mv defconfig arch/arm64/configs/defconfig
[...]
> Irrespective of the above, I know Catalin was running into issues with his
> automated tests where the kernel would die silently during early boot with
> some seeds.  That's a bit rubbish if it's still the case -- Catalin?

I stopped seeing these failures about 1-2 months ago, not sure exactly
what changed. I'm ok to enable KASLR in defconfig.

-- 
Catalin


Re: linux-next: Fixes tags need some work in the arm64 tree

2019-06-21 Thread Catalin Marinas
On Fri, Jun 21, 2019 at 09:36:15PM +1000, Stephen Rothwell wrote:
> In commit
> 
>   86fc32fee888 ("arm64: Fix incorrect irqflag restore for priority masking")
> 
> Fixes tag
> 
>   Fixes: commit 4a503217ce37 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt 
> masking")
> 
> has these problem(s):
> 
>   - leading word 'commit' unexpected

Thanks Stephen. Updated (and I should probably update my patch applying
script to rewrite the fixes lines with something like
git show -s --pretty=format:"Fixes: %h (\"%s\")" )

-- 
Catalin


Re: [PATCH v4 0/8] arm64: IRQ priority masking and Pseudo-NMI fixes

2019-06-21 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 10:38:05AM +0100, Julien Thierry wrote:
> Julien Thierry (7):
>   arm64: Do not enable IRQs for ct_user_exit
>   arm64: irqflags: Pass flags as readonly operand to restore instruction
>   arm64: irqflags: Add condition flags to inline asm clobber list
>   arm64: Fix interrupt tracing in the presence of NMIs
>   arm64: Fix incorrect irqflag restore for priority masking
>   arm64: irqflags: Introduce explicit debugging for IRQ priorities
>   arm64: Allow selecting Pseudo-NMI again
> 
> Wei Li (1):
>   arm64: fix kernel stack overflow in kdump capture kernel

I queued these patches for 5.3 (and should appear in next soon). Thanks.

-- 
Catalin


Re: [PATCH v2] arm64/mm: Correct the cache line size warning with non coherent device

2019-06-17 Thread Catalin Marinas
On Mon, Jun 17, 2019 at 07:00:34PM +0800, Zhangshaokun wrote:
> On 2019/6/17 18:45, Catalin Marinas wrote:
> > On Sat, Jun 15, 2019 at 10:44:33AM +0800, Zhangshaokun wrote:
> >> On 2019/6/14 21:11, Masayoshi Mizuma wrote:
> >>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> >>> index 1669618db08a..379589dc7113 100644
> >>> --- a/arch/arm64/mm/dma-mapping.c
> >>> +++ b/arch/arm64/mm/dma-mapping.c
> >>> @@ -38,10 +38,6 @@ void arch_dma_prep_coherent(struct page *page, size_t 
> >>> size)
> >>>  
> >>>  static int __init arm64_dma_init(void)
> >>>  {
> >>> - WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
> >>> -TAINT_CPU_OUT_OF_SPEC,
> >>> -"ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
> >>> -ARCH_DMA_MINALIGN, cache_line_size());
> >>>   return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC));
> >>>  }
> >>>  arch_initcall(arm64_dma_init);
> >>> @@ -56,7 +52,17 @@ void arch_teardown_dma_ops(struct device *dev)
> >>>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >>>   const struct iommu_ops *iommu, bool coherent)
> >>>  {
> >>> + int cls = cache_line_size_of_cpu();
> >>
> >> whether we need this local variable, how about use cache_line_size_of_cpu
> >> directly in WARN_TAINT just like before.
> > 
> > The reason being?
> 
> Since it is inline function,  maybe it is unnecessary, it is trivial.

OTOH, you end up with two reads from the CTR_EL0 register.

-- 
Catalin


Re: [PATCH v2] arm64/mm: Correct the cache line size warning with non coherent device

2019-06-17 Thread Catalin Marinas
On Sat, Jun 15, 2019 at 10:44:33AM +0800, Zhangshaokun wrote:
> On 2019/6/14 21:11, Masayoshi Mizuma wrote:
> > diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> > index 6eaf1c07aa4e..7fa6828bb488 100644
> > --- a/arch/arm64/kernel/cacheinfo.c
> > +++ b/arch/arm64/kernel/cacheinfo.c
> > @@ -19,12 +19,10 @@
> >  
> >  int cache_line_size(void)
> >  {
> > -   u32 cwg = cache_type_cwg();
> > -
> > if (coherency_max_size != 0)
> > return coherency_max_size;
> >  
> > -   return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
> > +   return cache_line_size_of_cpu();
> >  }
> 
> How about simplify it as this?
> 
> int cache_line_size(void)
> {
> return coherency_max_size ? coherency_max_size :
> cache_line_size_of_cpu();
> }

I don't see this as a simplification, easier to read with explicit 'if'.

> >  EXPORT_SYMBOL_GPL(cache_line_size);
> >  
> > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> > index 1669618db08a..379589dc7113 100644
> > --- a/arch/arm64/mm/dma-mapping.c
> > +++ b/arch/arm64/mm/dma-mapping.c
> > @@ -38,10 +38,6 @@ void arch_dma_prep_coherent(struct page *page, size_t 
> > size)
> >  
> >  static int __init arm64_dma_init(void)
> >  {
> > -   WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
> > -  TAINT_CPU_OUT_OF_SPEC,
> > -  "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
> > -  ARCH_DMA_MINALIGN, cache_line_size());
> > return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC));
> >  }
> >  arch_initcall(arm64_dma_init);
> > @@ -56,7 +52,17 @@ void arch_teardown_dma_ops(struct device *dev)
> >  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > const struct iommu_ops *iommu, bool coherent)
> >  {
> > +   int cls = cache_line_size_of_cpu();
> 
> whether we need this local variable, how about use cache_line_size_of_cpu
> directly in WARN_TAINT just like before.

The reason being?

Anyway, I'll queue v2 of this patch as is for 5.3. Thanks.

-- 
Catalin


Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-13 Thread Catalin Marinas
On Thu, Jun 13, 2019 at 04:45:54PM +0100, Vincenzo Frascino wrote:
> On 13/06/2019 16:35, Catalin Marinas wrote:
> > On Thu, Jun 13, 2019 at 12:16:59PM +0100, Dave P Martin wrote:
> >> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> >>> +
> >>> +/*
> >>> + * Control the relaxed ABI allowing tagged user addresses into the 
> >>> kernel.
> >>> + */
> >>> +static unsigned int tagged_addr_prctl_allowed = 1;
> >>> +
> >>> +long set_tagged_addr_ctrl(unsigned long arg)
> >>> +{
> >>> + if (!tagged_addr_prctl_allowed)
> >>> + return -EINVAL;
> >>
> >> So, tagging can actually be locked on by having a process enable it and
> >> then some possibly unrelated process clearing tagged_addr_prctl_allowed.
> >> That feels a bit weird.
> > 
> > The problem is that if you disable the ABI globally, lots of
> > applications would crash. This sysctl is meant as a way to disable the
> > opt-in to the TBI ABI. Another option would be a kernel command line
> > option (I'm not keen on a Kconfig option).
> 
> Why you are not keen on a Kconfig option?

Because I don't want to rebuild the kernel/reboot just to be able to
test how user space handles the ABI opt-in. I'm ok with a Kconfig option
to disable this globally in addition to a run-time option (if actually
needed, I'm not sure).

-- 
Catalin


Re: [PATCH 1/2] arm64/mm: check cpu cache line size with non-coherent device

2019-06-13 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 06:02:47PM -0400, Masayoshi Mizuma wrote:
> On Tue, Jun 11, 2019 at 07:00:07PM +0100, Catalin Marinas wrote:
> > On Tue, Jun 11, 2019 at 11:17:30AM -0400, Masayoshi Mizuma wrote:
> > > --- a/arch/arm64/mm/dma-mapping.c
> > > +++ b/arch/arm64/mm/dma-mapping.c
> > > @@ -91,10 +91,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct 
> > > *vma,
> > >  
> > >  static int __init arm64_dma_init(void)
> > >  {
> > > - WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
> > > -TAINT_CPU_OUT_OF_SPEC,
> > > -"ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
> > > -ARCH_DMA_MINALIGN, cache_line_size());
> > >   return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC));
> > >  }
> > >  arch_initcall(arm64_dma_init);
> > > @@ -473,6 +469,11 @@ void arch_setup_dma_ops(struct device *dev, u64 
> > > dma_base, u64 size,
> > >   const struct iommu_ops *iommu, bool coherent)
> > >  {
> > >   dev->dma_coherent = coherent;
> > > +
> > > + if (!coherent && (cache_line_size() > ARCH_DMA_MINALIGN))
> > > + dev_WARN(dev, "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < 
> > > %d)",
> > > + ARCH_DMA_MINALIGN, cache_line_size());
> > 
> > I'm ok in principle with this patch, with the minor issue that since
> > commit 7b8c87b297a7 ("arm64: cacheinfo: Update cache_line_size detected
> > from DT or PPTT") queued for 5.3 cache_line_size() gets the information
> > from DT or ACPI. The reason for this change is that the information is
> > used for performance tuning rather than DMA coherency.
> > 
> > You can go for a direct cache_type_cwg() check in here, unless Robin
> > (cc'ed) has a better idea.
> 
> Got it, thanks.
> I believe coherency_max_size is zero in case of coherent is false,
> so I'll modify the patch as following. Does it make sense?

The coherency_max_size gives you the largest cache line in the system,
independent of whether a device is coherent or not. You may have a
device that does not snoop L1/L2 but there is a transparent L3 (system
cache) with a larger cache line that the device may be able to snoop.
The coherency_max_size and therefore cache_line_size() would give you
this L3 value but the device would work fine since CWG <=
ARCH_DMA_MINALIGN.

> 
> @@ -57,6 +53,11 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
> u64 size,
> const struct iommu_ops *iommu, bool coherent)
>  {
> dev->dma_coherent = coherent;
> +
> +   if (!coherent && (cache_line_size() > ARCH_DMA_MINALIGN))
> +   dev_WARN(dev, "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d 
> < %d)",
> +   ARCH_DMA_MINALIGN, (4 << cache_type_cwg()));
> +
> if (iommu)
> iommu_setup_dma_ops(dev, dma_base, size);

I think the easiest here is to add a local variable:

int cls = 4 << cache_type_cwg();

and check it against ARCH_DMA_MINALIGN.

-- 
Catalin


Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt

2019-06-13 Thread Catalin Marinas
Hi Szabolcs,

On Wed, Jun 12, 2019 at 05:30:34PM +0100, Szabolcs Nagy wrote:
> On 12/06/2019 15:21, Vincenzo Frascino wrote:
> > +2. ARM64 Tagged Address ABI
> > +---
> > +
> > +From the kernel syscall interface prospective, we define, for the purposes
>  ^^^
> perspective
> 
> > +of this document, a "valid tagged pointer" as a pointer that either it has
> > +a zero value set in the top byte or it has a non-zero value, it is in 
> > memory
> > +ranges privately owned by a userspace process and it is obtained in one of
> > +the following ways:
> > +  - mmap() done by the process itself, where either:
> > +* flags = MAP_PRIVATE | MAP_ANONYMOUS
> > +* flags = MAP_PRIVATE and the file descriptor refers to a regular
> > +  file or "/dev/zero"
> 
> this does not make it clear if MAP_FIXED or other flags are valid
> (there are many map flags i don't know, but at least fixed should work
> and stack/growsdown. i'd expect anything that's not incompatible with
> private|anon to work).

Just to clarify, this document tries to define the memory ranges from
where tagged addresses can be passed into the kernel in the context
of TBI only (not MTE); that is for hwasan support. FIXED or GROWSDOWN
should not affect this.

> > +  - a mapping below sbrk(0) done by the process itself
> 
> doesn't the mmap rule cover this?

IIUC it doesn't cover it as that's memory mapped by the kernel
automatically on access vs a pointer returned by mmap(). The statement
above talks about how the address is obtained by the user.

> > +  - any memory mapped by the kernel in the process's address space during
> > +creation and following the restrictions presented above (i.e. data, 
> > bss,
> > +stack).
> 
> OK.
> 
> Can a null pointer have a tag?
> (in case NULL is valid to pass to a syscall)

Good point. I don't think it can. We may change this for MTE where we
give a hint tag but no hint address, however, this document only covers
TBI for now.

> > +The ARM64 Tagged Address ABI is an opt-in feature, and an application can
> > +control it using the following prctl()s:
> > +  - PR_SET_TAGGED_ADDR_CTRL: can be used to enable the Tagged Address ABI.
> > +  - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the Tagged
> > + Address ABI.
> > +
> > +As a consequence of invoking PR_SET_TAGGED_ADDR_CTRL prctl() by an 
> > applications,
> > +the ABI guarantees the following behaviours:
> > +
> > +  - Every current or newly introduced syscall can accept any valid tagged
> > +pointers.
> > +
> > +  - If a non valid tagged pointer is passed to a syscall then the behaviour
> > +is undefined.
> > +
> > +  - Every valid tagged pointer is expected to work as an untagged one.
> > +
> > +  - The kernel preserves any valid tagged pointers and returns them to the
> > +userspace unchanged in all the cases except the ones documented in the
> > +"Preserving tags" paragraph of tagged-pointers.txt.
> 
> OK.
> 
> i guess pointers of another process are not "valid tagged pointers"
> for the current one, so e.g. in ptrace the ptracer has to clear the
> tags before PEEK etc.

Another good point. Are there any pros/cons here or use-cases? When we
add MTE support, should we handle this differently?

> > +A definition of the meaning of tagged pointers on arm64 can be found in:
> > +Documentation/arm64/tagged-pointers.txt.
> > +
> > +3. ARM64 Tagged Address ABI Exceptions
> > +--
> > +
> > +The behaviours described in paragraph 2, with particular reference to the
> > +acceptance by the syscalls of any valid tagged pointer are not applicable
> > +to the following cases:
> > +  - mmap() addr parameter.
> > +  - mremap() new_address parameter.
> > +  - prctl_set_mm() struct prctl_map fields.
> > +  - prctl_set_mm_map() struct prctl_map fields.
> 
> i don't understand the exception: does it mean that passing a tagged
> address to these syscalls is undefined?

I'd say it's as undefined as it is right now without these patches. We
may be able to explain this better in the document.

-- 
Catalin


Re: [PATCH v4 1/2] arm64: Define Documentation/arm64/tagged-address-abi.txt

2019-06-13 Thread Catalin Marinas
On Thu, Jun 13, 2019 at 02:23:43PM +0100, Dave P Martin wrote:
> On Thu, Jun 13, 2019 at 01:28:21PM +0100, Catalin Marinas wrote:
> > On Thu, Jun 13, 2019 at 12:37:32PM +0100, Dave P Martin wrote:
> > > On Thu, Jun 13, 2019 at 11:15:34AM +0100, Vincenzo Frascino wrote:
> > > > On 12/06/2019 16:35, Catalin Marinas wrote:
> > > > > On Wed, Jun 12, 2019 at 03:21:10PM +0100, Vincenzo Frascino wrote:
> > > > >> +  - PR_GET_TAGGED_ADDR_CTRL: can be used to check the status of the 
> > > > >> Tagged
> > > > >> + Address ABI.
> > [...]
> > > Is there a canonical way to detect whether this whole API/ABI is
> > > available?  (i.e., try to call this prctl / check for an HWCAP bit,
> > > etc.)
> > 
> > The canonical way is a prctl() call. HWCAP doesn't make sense since it's
> > not a hardware feature. If you really want a different way of detecting
> > this (which I don't think it's worth), we can reinstate the AT_FLAGS
> > bit.
> 
> Sure, I think this probably makes sense -- I'm still getting my around
> which parts of the design are directly related to MTE and which aren't.
> 
> I was a bit concerned about the interaction between
> PR_SET_TAGGED_ADDR_CTRL and the sysctl: the caller might conclude that
> this API is unavailable when actually tagged addresses are stuck on.
> 
> I'm not sure whether this matters, but it's a bit weird.
> 
> One option would be to change the semantics, so that the sysctl just
> forbids turning tagging from off to on.  Alternatively, we could return
> a different error code to distinguish this case.

This is the intention, just to forbid turning tagging on. We could
return -EPERM instead, though my original intent was to simply pretend
that the prctl does not exist like in an older kernel version.

-- 
Catalin


Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-13 Thread Catalin Marinas
On Thu, Jun 13, 2019 at 12:16:59PM +0100, Dave P Martin wrote:
> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> > From: Catalin Marinas 
> > 
> > It is not desirable to relax the ABI to allow tagged user addresses into
> > the kernel indiscriminately. This patch introduces a prctl() interface
> > for enabling or disabling the tagged ABI with a global sysctl control
> > for preventing applications from enabling the relaxed ABI (meant for
> > testing user-space prctl() return error checking without reconfiguring
> > the kernel). The ABI properties are inherited by threads of the same
> > application and fork()'ed children but cleared on execve().
> > 
> > The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
> > MTE-specific settings like imprecise vs precise exceptions.
> > 
> > Signed-off-by: Catalin Marinas 
> > ---
> >  arch/arm64/include/asm/processor.h   |  6 +++
> >  arch/arm64/include/asm/thread_info.h |  1 +
> >  arch/arm64/include/asm/uaccess.h |  3 +-
> >  arch/arm64/kernel/process.c  | 67 
> >  include/uapi/linux/prctl.h   |  5 +++
> >  kernel/sys.c | 16 +++
> >  6 files changed, 97 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/processor.h 
> > b/arch/arm64/include/asm/processor.h
> > index fcd0e691b1ea..fee457456aa8 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -307,6 +307,12 @@ extern void __init minsigstksz_setup(void);
> >  /* PR_PAC_RESET_KEYS prctl */
> >  #define PAC_RESET_KEYS(tsk, arg)   ptrauth_prctl_reset_keys(tsk, arg)
> >  
> > +/* PR_TAGGED_ADDR prctl */
> 
> (A couple of comments I missed in my last reply:)
> 
> Name mismatch?

Yeah, it went through several names but it seems that I didn't update
all places.

> > +long set_tagged_addr_ctrl(unsigned long arg);
> > +long get_tagged_addr_ctrl(void);
> > +#define SET_TAGGED_ADDR_CTRL(arg)  set_tagged_addr_ctrl(arg)
> > +#define GET_TAGGED_ADDR_CTRL() get_tagged_addr_ctrl()
> > +
> 
> [...]
> 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 3767fb21a5b8..69d0be1fc708 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -30,6 +30,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -323,6 +324,7 @@ void flush_thread(void)
> > fpsimd_flush_thread();
> > tls_thread_flush();
> > flush_ptrace_hw_breakpoint(current);
> > +   clear_thread_flag(TIF_TAGGED_ADDR);
> >  }
> >  
> >  void release_thread(struct task_struct *dead_task)
> > @@ -552,3 +554,68 @@ void arch_setup_new_exec(void)
> >  
> > ptrauth_thread_init_user(current);
> >  }
> > +
> > +/*
> > + * Control the relaxed ABI allowing tagged user addresses into the kernel.
> > + */
> > +static unsigned int tagged_addr_prctl_allowed = 1;
> > +
> > +long set_tagged_addr_ctrl(unsigned long arg)
> > +{
> > +   if (!tagged_addr_prctl_allowed)
> > +   return -EINVAL;
> 
> So, tagging can actually be locked on by having a process enable it and
> then some possibly unrelated process clearing tagged_addr_prctl_allowed.
> That feels a bit weird.

The problem is that if you disable the ABI globally, lots of
applications would crash. This sysctl is meant as a way to disable the
opt-in to the TBI ABI. Another option would be a kernel command line
option (I'm not keen on a Kconfig option).

> Do we want to allow a process that has tagging on to be able to turn
> it off at all?  Possibly things like CRIU might want to do that.

I left it in for symmetry but I don't expect it to be used. A potential
use-case is doing it per subsequent threads in an application.

> > +   if (is_compat_task())
> > +   return -EINVAL;
> > +   if (arg & ~PR_TAGGED_ADDR_ENABLE)
> > +   return -EINVAL;
> 
> How do we expect this argument to be extended in the future?

Yes, for MTE. That's why I wouldn't allow random bits here.

> I'm wondering whether this is really a bitmask or an enum, or a mixture
> of the two.  Maybe it doesn't matter.

User may want to set PR_TAGGED_ADDR_ENABLE | PR_MTE_PRECISE in a single
call.

> > +   if (arg & PR_TAGGED_ADDR_ENABLE)
> > +   set_thread_flag(TIF_TAGGED_ADDR);
> > +   else
> > +   clear_thread_flag(TIF_TAGGED_ADDR);
> 
> I think update_thread_flag() could be used here.

Yes. I forgot you added this.

-- 
Catalin


Re: [PATCH v17 03/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-06-13 Thread Catalin Marinas
Hi Dave,

On Thu, Jun 13, 2019 at 12:02:35PM +0100, Dave P Martin wrote:
> On Wed, Jun 12, 2019 at 01:43:20PM +0200, Andrey Konovalov wrote:
> > +/*
> > + * Global sysctl to disable the tagged user addresses support. This control
> > + * only prevents the tagged address ABI enabling via prctl() and does not
> > + * disable it for tasks that already opted in to the relaxed ABI.
> > + */
> > +static int zero;
> > +static int one = 1;
> 
> !!!
> 
> And these can't even be const without a cast.  Yuk.
> 
> (Not your fault though, but it would be nice to have a proc_dobool() to
> avoid this.)

I had the same reaction. Maybe for another patch sanitising this pattern
across the kernel.

> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,9 @@ struct prctl_mm_map {
> >  # define PR_PAC_APDBKEY(1UL << 3)
> >  # define PR_PAC_APGAKEY(1UL << 4)
> >  
> > +/* Tagged user address controls for arm64 */
> > +#define PR_SET_TAGGED_ADDR_CTRL55
> > +#define PR_GET_TAGGED_ADDR_CTRL56
> > +# define PR_TAGGED_ADDR_ENABLE (1UL << 0)
> > +
> 
> Do we expect this prctl to be applicable to other arches, or is it
> strictly arm64-specific?

I kept it generic, at least the tagged address part. The MTE bits later
on would be arm64-specific.

> > @@ -2492,6 +2498,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
> > arg2, unsigned long, arg3,
> > return -EINVAL;
> > error = PAC_RESET_KEYS(me, arg2);
> > break;
> > +   case PR_SET_TAGGED_ADDR_CTRL:
> > +   if (arg3 || arg4 || arg5)
> 
> 
> 
> How do you anticipate these arguments being used in the future?

I don't expect them to be used at all. But since I'm not sure, I'd force
them as zero for now rather than ignored. The GET is supposed to return
the SET arg2, hence I'd rather not used the other arguments.

> For the SVE prctls I took the view that "get" could only ever mean one
> thing, and "put" already had a flags argument with spare bits for future
> expansion anyway, so forcing the extra arguments to zero would be
> unnecessary.
> 
> Opinions seem to differ on whether requiring surplus arguments to be 0
> is beneficial for hygiene, but the glibc prototype for prctl() is
> 
>   int prctl (int __option, ...);
> 
> so it seemed annoying to have to pass extra arguments to it just for the
> sake of it.  IMHO this also makes the code at the call site less
> readable, since it's not immediately apparent that all those 0s are
> meaningless.

It's fine by me to ignore the other arguments. I just followed the
pattern of some existing prctl options. I don't have a strong opinion
either way.

> > +   return -EINVAL;
> > +   error = SET_TAGGED_ADDR_CTRL(arg2);
> > +   break;
> > +   case PR_GET_TAGGED_ADDR_CTRL:
> > +   if (arg2 || arg3 || arg4 || arg5)
> > +   return -EINVAL;
> > +   error = GET_TAGGED_ADDR_CTRL();
> 
> Having a "get" prctl is probably a good idea, but is there a clear
> usecase for it?

Not sure, maybe some other library (e.g. a JIT compiler) would like to
check whether tagged addresses have been enabled during application
start and decide to generate tagged pointers for itself. It seemed
pretty harmless, unless we add more complex things to the prctl() that
cannot be returned in one request).

-- 
Catalin


Re: [PATCH v2 1/2] mm: kmemleak: change error at _write when kmemleak is disabled

2019-06-13 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 12:52:30PM -0300, André Almeida wrote:
> According to POSIX, EBUSY means that the "device or resource is busy",
> and this can lead to people thinking that the file
> `/sys/kernel/debug/kmemleak/` is somehow locked or being used by other
> process. Change this error code to a more appropriate one.
> 
> Signed-off-by: André Almeida 

Acked-by: Catalin Marinas 


Re: [PATCH v2 2/2] docs: kmemleak: add more documentation details

2019-06-13 Thread Catalin Marinas
On Wed, Jun 12, 2019 at 12:52:31PM -0300, André Almeida wrote:
> Wikipedia now has a main article to "tracing garbage collector" topic.
> Change the URL and use the reStructuredText syntax for hyperlinks and add
> more details about the use of the tool. Add a section about how to use
> the kmemleak-test module to test the memory leak scanning.
> 
> Signed-off-by: André Almeida 

Acked-by: Catalin Marinas 


Re: [PATCH 1/2] arm64/mm: check cpu cache line size with non-coherent device

2019-06-11 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 11:17:30AM -0400, Masayoshi Mizuma wrote:
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -91,10 +91,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
>  
>  static int __init arm64_dma_init(void)
>  {
> - WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
> -TAINT_CPU_OUT_OF_SPEC,
> -"ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
> -ARCH_DMA_MINALIGN, cache_line_size());
>   return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC));
>  }
>  arch_initcall(arm64_dma_init);
> @@ -473,6 +469,11 @@ void arch_setup_dma_ops(struct device *dev, u64 
> dma_base, u64 size,
>   const struct iommu_ops *iommu, bool coherent)
>  {
>   dev->dma_coherent = coherent;
> +
> + if (!coherent && (cache_line_size() > ARCH_DMA_MINALIGN))
> + dev_WARN(dev, "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < 
> %d)",
> + ARCH_DMA_MINALIGN, cache_line_size());

I'm ok in principle with this patch, with the minor issue that since
commit 7b8c87b297a7 ("arm64: cacheinfo: Update cache_line_size detected
from DT or PPTT") queued for 5.3 cache_line_size() gets the information
from DT or ACPI. The reason for this change is that the information is
used for performance tuning rather than DMA coherency.

You can go for a direct cache_type_cwg() check in here, unless Robin
(cc'ed) has a better idea.

-- 
Catalin


Re: [PATCH 2/2] arm64/mm: show TAINT_CPU_OUT_OF_SPEC warning if the cache size is over the spec.

2019-06-11 Thread Catalin Marinas
On Tue, Jun 11, 2019 at 11:17:31AM -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Show the warning and taints as TAINT_CPU_OUT_OF_SPEC if the cache line
> size is greater than the maximum.

In general the "out of spec" part is a misnomer, we tend to apply it to
CPU features that are not supported by the kernel rather than some CPU
feature not compliant with the architecture (we call the latter errata).

I suggest you drop this patch.

-- 
Catalin


Re: [PATCH v16 02/16] arm64: untag user pointers in access_ok and __uaccess_mask_ptr

2019-06-11 Thread Catalin Marinas
On Mon, Jun 10, 2019 at 06:53:27PM +0100, Catalin Marinas wrote:
> On Mon, Jun 03, 2019 at 06:55:04PM +0200, Andrey Konovalov wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h 
> > b/arch/arm64/include/asm/uaccess.h
> > index e5d5f31c6d36..9164ecb5feca 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -94,7 +94,7 @@ static inline unsigned long __range_ok(const void __user 
> > *addr, unsigned long si
> > return ret;
> >  }
> >  
> > -#define access_ok(addr, size)  __range_ok(addr, size)
> > +#define access_ok(addr, size)  __range_ok(untagged_addr(addr), size)
> 
> I'm going to propose an opt-in method here (RFC for now). We can't have
> a check in untagged_addr() since this is already used throughout the
> kernel for both user and kernel addresses (khwasan) but we can add one
> in __range_ok(). The same prctl() option will be used for controlling
> the precise/imprecise mode of MTE later on. We can use a TIF_ flag here
> assuming that this will be called early on and any cloned thread will
> inherit this.

Updated patch, inlining it below. Once we agreed on the approach, I
think Andrey can insert in in this series, probably after patch 2. The
differences from the one I posted yesterday:

- renamed PR_* macros together with get/set variants and the possibility
  to disable the relaxed ABI

- sysctl option - /proc/sys/abi/tagged_addr to disable the ABI globally
  (just the prctl() opt-in, tasks already using it won't be affected)

And, of course, it needs more testing.

-----8<----
>From 7c624777a4e545522dec1b34e60f0229cb2bd59f Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Tue, 11 Jun 2019 13:03:38 +0100
Subject: [PATCH] arm64: Introduce prctl() options to control the tagged user
 addresses ABI

It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve().

The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.

Signed-off-by: Catalin Marinas 
---
 arch/arm64/include/asm/processor.h   |  6 +++
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/include/asm/uaccess.h |  5 ++-
 arch/arm64/kernel/process.c  | 67 
 include/uapi/linux/prctl.h   |  5 +++
 kernel/sys.c | 16 +++
 6 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fcd0e691b1ea..fee457456aa8 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -307,6 +307,12 @@ extern void __init minsigstksz_setup(void);
 /* PR_PAC_RESET_KEYS prctl */
 #define PAC_RESET_KEYS(tsk, arg)   ptrauth_prctl_reset_keys(tsk, arg)
 
+/* PR_TAGGED_ADDR prctl */
+long set_tagged_addr_ctrl(unsigned long arg);
+long get_tagged_addr_ctrl(void);
+#define SET_TAGGED_ADDR_CTRL(arg)  set_tagged_addr_ctrl(arg)
+#define GET_TAGGED_ADDR_CTRL() get_tagged_addr_ctrl()
+
 /*
  * For CONFIG_GCC_PLUGIN_STACKLEAK
  *
diff --git a/arch/arm64/include/asm/thread_info.h 
b/arch/arm64/include/asm/thread_info.h
index c285d1ce7186..7263d4c973ce 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -101,6 +101,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_SVE23  /* Scalable Vector Extension in 
use */
 #define TIF_SVE_VL_INHERIT 24  /* Inherit sve_vl_onexec across exec */
 #define TIF_SSBD   25  /* Wants SSB mitigation */
+#define TIF_TAGGED_ADDR26
 
 #define _TIF_SIGPENDING(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 9164ecb5feca..995b9ea11a89 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -73,6 +73,9 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
 {
unsigned long ret, limit = current_thread_info()->addr_limit;
 
+   if (test_thread_flag(TIF_TAGGED_ADDR))
+   addr = untagged_addr(addr);
+
__chk_user_ptr(addr);
asm volatile(
// A + B <= C + 1 for all A,B,C, in four easy steps:
@@ -94,7 +97,7 @@ static inline unsigned long __range_ok(const void __user 
*addr, unsigned long si
return

Re: [PATCH] clocksource/arm_arch_timer: extract elf_hwcap use to arch-helper

2019-06-10 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 02:14:13PM +0100, Andrew Murray wrote:
> diff --git a/arch/arm/include/asm/arch_timer.h 
> b/arch/arm/include/asm/arch_timer.h
> index 0a8d7bba2cb0..f21e038dc9f3 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -110,6 +111,18 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>   isb();
>  }
>  
> +static inline bool arch_timer_set_evtstrm_feature(void)
> +{
> + elf_hwcap |= HWCAP_EVTSTRM;
> +#ifdef CONFIG_COMPAT
> + compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
> +#endif
> +}

There is no COMPAT support on arm32.

-- 
Catalin


Re: [PATCH V3 1/2] arm64/mm: Consolidate page fault information capture

2019-06-07 Thread Catalin Marinas
On Fri, Jun 07, 2019 at 11:30:46AM +0100, Mark Rutland wrote:
> On Fri, Jun 07, 2019 at 02:43:05PM +0530, Anshuman Khandual wrote:
> > This consolidates page fault information capture and move them bit earlier.
> > While here it also adds an wrapper is_write_abort(). It also saves some
> > cycles by replacing multiple user_mode() calls into a single one earlier
> > during the fault.
> > 
> > Signed-off-by: Anshuman Khandual 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: Mark Rutland 
> > Cc: James Morse 
> > Cc: Andrey Konovalov 
> > ---
> >  arch/arm64/mm/fault.c | 26 +++---
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> As I mentioned previously, I doubt that this has any measureable impact
> on performance, and other than commenting the caveats w.r.t. cache
> maintenance, I'm not sure this makes things any clearer.
> 
> However, AFAICT it is correct, so I'll leave that to Catalin:

I only merged the is_write_abort() wrapper from this patch (changed the
commit log as well).

-- 
Catalin


Re: [PATCH] arm64: Fix comment after #endif

2019-06-07 Thread Catalin Marinas
On Fri, Jun 07, 2019 at 01:49:10AM +0200, Odin Ugedal wrote:
> The config value used in the if was changed in
> b433dce056d3814dc4b33e5a8a533d6401ffcfb0, but the comment on the
> corresponding end was not changed.
> 
> Signed-off-by: Odin Ugedal 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH 4/8] arm64: Basic Branch Target Identification support

2019-06-06 Thread Catalin Marinas
On Fri, May 24, 2019 at 03:53:06PM +0100, Dave P Martin wrote:
> On Fri, May 24, 2019 at 02:02:17PM +0100, Mark Rutland wrote:
> > On Fri, May 24, 2019 at 11:25:29AM +0100, Dave Martin wrote:
> > >  #endif /* _UAPI__ASM_HWCAP_H */
> > > diff --git a/arch/arm64/include/uapi/asm/mman.h 
> > > b/arch/arm64/include/uapi/asm/mman.h
> > > new file mode 100644
> > > index 000..4776b43
> > > --- /dev/null
> > > +++ b/arch/arm64/include/uapi/asm/mman.h
> > > @@ -0,0 +1,9 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > +#ifndef _UAPI__ASM_MMAN_H
> > > +#define _UAPI__ASM_MMAN_H
> > > +
> > > +#include 
> > > +
> > > +#define PROT_BTI_GUARDED 0x10/* BTI guarded page */
> > 
> > From prior discussions, I thought this would be PROT_BTI, without the
> > _GUARDED suffix. Do we really need that?
> > 
> > AFAICT, all other PROT_* definitions only have a single underscore, and
> > the existing arch-specific flags are PROT_ADI on sparc, and PROT_SAO on
> > powerpc.
> 
> No strong opinon.  I was trying to make the name less obscure, but I'm
> equally happy with PROT_BTI if people prefer that.

I prefer PROT_BTI as well. We are going to add a PROT_MTE at some point
(and a VM_ARM64_MTE in the high VMA flag bits).

-- 
Catalin


Re: [PATCH V2 4/4] arm64/mm: Drop local variable vm_fault_t from __do_page_fault()

2019-06-06 Thread Catalin Marinas
On Thu, Jun 06, 2019 at 10:24:01AM +0530, Anshuman Khandual wrote:
> On 06/04/2019 08:26 PM, Catalin Marinas wrote:
> > On Mon, Jun 03, 2019 at 12:11:25PM +0530, Anshuman Khandual wrote:
> >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> >> index 4bb65f3..41fa905 100644
> >> --- a/arch/arm64/mm/fault.c
> >> +++ b/arch/arm64/mm/fault.c
> >> @@ -397,37 +397,29 @@ static void do_bad_area(unsigned long addr, unsigned 
> >> int esr, struct pt_regs *re
> >>  static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long 
> >> addr,
> >>   unsigned int mm_flags, unsigned long vm_flags)
> >>  {
> >> -  struct vm_area_struct *vma;
> >> -  vm_fault_t fault;
> >> +  struct vm_area_struct *vma = find_vma(mm, addr);
> >>  
> >> -  vma = find_vma(mm, addr);
> >> -  fault = VM_FAULT_BADMAP;
> >>if (unlikely(!vma))
> >> -  goto out;
> >> -  if (unlikely(vma->vm_start > addr))
> >> -  goto check_stack;
> >> +  return VM_FAULT_BADMAP;
> >>  
> >>/*
> >> * Ok, we have a good vm_area for this memory access, so we can handle
> >> * it.
> >> */
> >> -good_area:
> >> +  if (unlikely(vma->vm_start > addr)) {
> >> +  if (!(vma->vm_flags & VM_GROWSDOWN))
> >> +  return VM_FAULT_BADMAP;
> >> +  if (expand_stack(vma, addr))
> >> +  return VM_FAULT_BADMAP;
> >> +  }
> > 
> > You could have a single return here:
> > 
> > if (unlikely(vma->vm_start > addr) &&
> > (!(vma->vm_flags & VM_GROWSDOWN) || expand_stack(vma, addr)))
> > return VM_FAULT_BADMAP;
> > 
> > Not sure it's any clearer though.
> 
> TBH the proposed one seems clearer as it separates effect (vma->vm_start > 
> addr)
> from required permission check (vma->vm_flags & VM_GROWSDOWN) and required 
> action
> (expand_stack(vma, addr)). But I am happy to change as you have mentioned if 
> that
> is preferred.

Not bothered really. You can leave them as in your proposal (I was just
seeing the VM_GROWSDOWN check tightly coupled with the expand_stack(),
it's fine either way).

-- 
Catalin


Re: [PATCH V2 3/4] arm64/mm: Consolidate page fault information capture

2019-06-06 Thread Catalin Marinas
On Thu, Jun 06, 2019 at 10:38:11AM +0100, Mark Rutland wrote:
> On Tue, Jun 04, 2019 at 03:42:09PM +0100, Catalin Marinas wrote:
> > On Mon, Jun 03, 2019 at 12:11:24PM +0530, Anshuman Khandual wrote:
> > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > index da02678..4bb65f3 100644
> > > --- a/arch/arm64/mm/fault.c
> > > +++ b/arch/arm64/mm/fault.c
> > > @@ -435,6 +435,14 @@ static bool is_el0_instruction_abort(unsigned int 
> > > esr)
> > >   return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
> > >  }
> > >  
> > > +/*
> > > + * This is applicable only for EL0 write aborts.
> > > + */
> > > +static bool is_el0_write_abort(unsigned int esr)
> > > +{
> > > + return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
> > > +}
> > 
> > What makes this EL0 only?
> 
> It returns false for EL1 faults caused by DC IVAC, where write
> permission is required. EL0 can only issue maintenance that requires
> read permission.
> 
> For whatever reason, the architecture says that WnR is always 1b1, even
> if read permission was sufficient.
> 
> How about:
> 
> /*
>  * Note: not valid for EL1 DC IVAC, but we never use that such that it
>  * should fault.
>  */

For completeness, I'd add "... should fault. EL0 cannot issue DC IVAC
(undef)." or something like that.

Looks fine otherwise.

-- 
Catalin


Re: [PATCH v4 0/4] ptrace: cleanup PTRACE_SYSEMU handling and add support for arm64

2019-06-05 Thread Catalin Marinas
On Thu, May 23, 2019 at 10:06:14AM +0100, Sudeep Holla wrote:
> Sudeep Holla (4):
>   ptrace: move clearing of TIF_SYSCALL_EMU flag to core
>   x86: simplify _TIF_SYSCALL_EMU handling
>   arm64: add PTRACE_SYSEMU{,SINGLESTEP} definations to uapi headers
>   arm64: ptrace: add support for syscall emulation

I queued patches 1, 3 and 4 through the arm64 tree. There is no
dependency on patch 2 (just general clean-)up; happy to take it as well
or it can go in via the x86 tree.

Thanks.

-- 
Catalin


Re: [PATCH 0/2] Allow assembly code to use BIT(), GENMASK(), etc. and clean-up arm64 header

2019-06-05 Thread Catalin Marinas
On Mon, May 27, 2019 at 05:34:10PM +0900, Masahiro Yamada wrote:
> Some in-kernel headers use _BITUL() instead of BIT().
> 
>  arch/arm64/include/asm/sysreg.h
>  arch/s390/include/asm/*.h
> 
> I think the reason is because BIT() is currently not available
> in assembly. It hard-codes 1UL, which is not available in assembly.
[...]
> Masahiro Yamada (2):
>   linux/bits.h: make BIT(), GENMASK(), and friends available in assembly
>   arm64: replace _BITUL() with BIT()
> 
>  arch/arm64/include/asm/sysreg.h | 82 -
>  include/linux/bits.h| 17 ---

I'm not sure it's worth the hassle. It's nice to have the same BIT macro
but a quick grep shows arc, arm64, s390 and x86 using _BITUL. Maybe a
tree-wide clean-up would be more appropriate.

-- 
Catalin


Re: [PATCH V2 4/4] arm64/mm: Drop local variable vm_fault_t from __do_page_fault()

2019-06-04 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 12:11:25PM +0530, Anshuman Khandual wrote:
> __do_page_fault() is over complicated with multiple goto statements. This
> cleans up the code flow and while there drops local variable vm_fault_t.

I'd change the subject as well here to something like refactor or
simplify __do_page_fault().

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 4bb65f3..41fa905 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -397,37 +397,29 @@ static void do_bad_area(unsigned long addr, unsigned 
> int esr, struct pt_regs *re
>  static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
>  unsigned int mm_flags, unsigned long vm_flags)
>  {
> - struct vm_area_struct *vma;
> - vm_fault_t fault;
> + struct vm_area_struct *vma = find_vma(mm, addr);
>  
> - vma = find_vma(mm, addr);
> - fault = VM_FAULT_BADMAP;
>   if (unlikely(!vma))
> - goto out;
> - if (unlikely(vma->vm_start > addr))
> - goto check_stack;
> + return VM_FAULT_BADMAP;
>  
>   /*
>* Ok, we have a good vm_area for this memory access, so we can handle
>* it.
>*/
> -good_area:
> + if (unlikely(vma->vm_start > addr)) {
> + if (!(vma->vm_flags & VM_GROWSDOWN))
> + return VM_FAULT_BADMAP;
> + if (expand_stack(vma, addr))
> + return VM_FAULT_BADMAP;
> + }

You could have a single return here:

if (unlikely(vma->vm_start > addr) &&
(!(vma->vm_flags & VM_GROWSDOWN) || expand_stack(vma, addr)))
return VM_FAULT_BADMAP;

Not sure it's any clearer though.

-- 
Catalin


Re: [PATCH V2 2/4] arm64/mm: Drop task_struct argument from __do_page_fault()

2019-06-04 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 12:11:23PM +0530, Anshuman Khandual wrote:
> The task_struct argument is not getting used in __do_page_fault(). Hence
> just drop it and use current or cuurent->mm instead where ever required.
> This does not change any functionality.
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: James Morse 
> Cc: Andrey Konovalov 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH V2 1/4] arm64/mm: Drop mmap_sem before calling __do_kernel_fault()

2019-06-04 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 12:11:22PM +0530, Anshuman Khandual wrote:
> There is an inconsistency between down_read_trylock() success and failure
> paths while dealing with kernel access for non exception table areas where
> it calls __do_kernel_fault(). In case of failure it just bails out without
> holding mmap_sem but when it succeeds it does so while holding mmap_sem.
> Fix this inconsistency by just dropping mmap_sem in success path as well.
> 
> __do_kernel_fault() calls die_kernel_fault() which then calls show_pte().
> show_pte() in this path might become bit more unreliable without holding
> mmap_sem. But there are already instances [1] in do_page_fault() where
> die_kernel_fault() gets called without holding mmap_sem. show_pte() can
> be made more robust independently but in a later patch.
> 
> [1] Conditional block for (is_ttbr0_addr && is_el1_permission_fault)
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: James Morse 
> Cc: Andrey Konovalov 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH V2 3/4] arm64/mm: Consolidate page fault information capture

2019-06-04 Thread Catalin Marinas
On Mon, Jun 03, 2019 at 12:11:24PM +0530, Anshuman Khandual wrote:
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index da02678..4bb65f3 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -435,6 +435,14 @@ static bool is_el0_instruction_abort(unsigned int esr)
>   return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
>  }
>  
> +/*
> + * This is applicable only for EL0 write aborts.
> + */
> +static bool is_el0_write_abort(unsigned int esr)
> +{
> + return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
> +}

What makes this EL0 only?

> +
>  static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  struct pt_regs *regs)
>  {
> @@ -443,6 +451,9 @@ static int __kprobes do_page_fault(unsigned long addr, 
> unsigned int esr,
>   vm_fault_t fault, major = 0;
>   unsigned long vm_flags = VM_READ | VM_WRITE;
>   unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + bool is_user = user_mode(regs);
> + bool is_el0_exec = is_el0_instruction_abort(esr);
> + bool is_el0_write = is_el0_write_abort(esr);
>  
>   if (notify_page_fault(regs, esr))
>   return 0;
> @@ -454,12 +465,12 @@ static int __kprobes do_page_fault(unsigned long addr, 
> unsigned int esr,
>   if (faulthandler_disabled() || !mm)
>   goto no_context;
>  
> - if (user_mode(regs))
> + if (is_user)
>   mm_flags |= FAULT_FLAG_USER;
>  
> - if (is_el0_instruction_abort(esr)) {
> + if (is_el0_exec) {
>   vm_flags = VM_EXEC;
> - } else if ((esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM)) {
> + } else if (is_el0_write) {
>   vm_flags = VM_WRITE;
>   mm_flags |= FAULT_FLAG_WRITE;
>   }

This can be triggered by an EL1 write to a user mapping, so is_el0_write
is misleading.

-- 
Catalin


Re: [PATCH V2] arm64/mm: Change BUG_ON() to VM_BUG_ON() in [pmd|pud]_set_huge()

2019-06-04 Thread Catalin Marinas
On Mon, May 27, 2019 at 12:33:29PM +0530, Anshuman Khandual wrote:
> There are no callers for the functions which will pass unaligned physical
> addresses. Hence just change these BUG_ON() checks into VM_BUG_ON() which
> gets compiled out unless CONFIG_VM_DEBUG is enabled.
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: Ard Biesheuvel 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH] arm64: kernel: use aff3 instead of aff2 in comment

2019-06-04 Thread Catalin Marinas
On Sat, Jun 01, 2019 at 10:08:08AM +0800, Liu Song wrote:
> From: Liu Song 
> 
> Should use aff3 instead of aff2 in comment.
> 
> Signed-off-by: Liu Song 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH] arm64/cpufeature: Convert hook_lock to raw_spin_lock_t in cpu_enable_ssbs()

2019-06-04 Thread Catalin Marinas
On Thu, May 30, 2019 at 12:30:58PM +0100, Julien Grall wrote:
> cpu_enable_ssbs() is called via stop_machine() as part of the cpu_enable
> callback. A spin lock is used to ensure the hook is registered before
> the rest of the callback is executed.
> 
> On -RT spin_lock() may sleep. However, all the callees in stop_machine()
> are expected to not sleep. Therefore a raw_spin_lock() is required here.
> 
> Given this is already done under stop_machine() and the work done under
> the lock is quite small, the latency should not increase too much.
> 
> Signed-off-by: Julien Grall 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH v2] arm64: mm: make CONFIG_ZONE_DMA32 configurable

2019-06-04 Thread Catalin Marinas
On Wed, May 29, 2019 at 12:08:20AM +0800, Miles Chen wrote:
> This change makes CONFIG_ZONE_DMA32 defuly y and allows users
> to overwrite it only when CONFIG_EXPERT=y.
> 
> For the SoCs that do not need CONFIG_ZONE_DMA32, this is the
> first step to manage all available memory by a single
> zone(normal zone) to reduce the overhead of multiple zones.
> 
> The change also fixes a build error when CONFIG_NUMA=y and
> CONFIG_ZONE_DMA32=n.
> 
> arch/arm64/mm/init.c:195:17: error: use of undeclared identifier 'ZONE_DMA32'
> max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys());
> 
> Change since v1:
> 1. only expose CONFIG_ZONE_DMA32 when CONFIG_EXPERT=y
> 2. remove redundant IS_ENABLED(CONFIG_ZONE_DMA32)
> 
> Cc: Robin Murphy 
> Signed-off-by: Miles Chen 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH] arm64/mm: Simplify protection flag creation for kernel huge mappings

2019-06-04 Thread Catalin Marinas
On Mon, May 27, 2019 at 09:28:15AM +0530, Anshuman Khandual wrote:
> Even though they have got the same value, PMD_TYPE_SECT and PUD_TYPE_SECT
> get used for kernel huge mappings. But before that first the table bit gets
> cleared using leaf level PTE_TABLE_BIT. Though functionally they are same,
> we should use page table level specific macros to be consistent as per the
> MMU specifications. Create page table level specific wrappers for kernel
> huge mapping entries and just drop mk_sect_prot() which does not have any
> other user.
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH v4 1/2] drivers: base: cacheinfo: Add variable to record max cache line size

2019-06-04 Thread Catalin Marinas
On Tue, May 28, 2019 at 10:16:53AM +0800, Shaokun Zhang wrote:
> Add coherency_max_size variable to record the maximum cache line size
> for different cache levels. If it is available, we will synchronize
> it as cache line size, otherwise we will use CTR_EL0.CWG reporting
> in cache_line_size() for arm64.
> 
> Cc: Greg Kroah-Hartman 
> Cc: "Rafael J. Wysocki" 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Jeremy Linton 
> Cc: Will Deacon 
> Signed-off-by: Shaokun Zhang 

Both patches queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH v2] uaccess: add noop untagged_addr definition

2019-06-04 Thread Catalin Marinas
On Tue, Jun 04, 2019 at 09:28:41AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2019 at 02:04:47PM +0200, Andrey Konovalov wrote:
> > Architectures that support memory tagging have a need to perform untagging
> > (stripping the tag) in various parts of the kernel. This patch adds an
> > untagged_addr() macro, which is defined as noop for architectures that do
> > not support memory tagging. The oncoming patch series will define it at
> > least for sparc64 and arm64.
> > 
> > Acked-by: Catalin Marinas 
> > Reviewed-by: Khalid Aziz 
> > Signed-off-by: Andrey Konovalov 
> >  include/linux/mm.h | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 0e8834ac32b7..dd0b5f4e1e45 100644
> > +++ b/include/linux/mm.h
> > @@ -99,6 +99,17 @@ extern int mmap_rnd_compat_bits __read_mostly;
> >  #include 
> >  #include 
> >  
> > +/*
> > + * Architectures that support memory tagging (assigning tags to memory 
> > regions,
> > + * embedding these tags into addresses that point to these memory regions, 
> > and
> > + * checking that the memory and the pointer tags match on memory accesses)
> > + * redefine this macro to strip tags from pointers.
> > + * It's defined as noop for arcitectures that don't support memory tagging.
> > + */
> > +#ifndef untagged_addr
> > +#define untagged_addr(addr) (addr)
> 
> Can you please make this a static inline instead of this macro? Then
> we can actually know what the input/output types are supposed to be.
> 
> Is it
> 
> static inline unsigned long untagged_addr(void __user *ptr) {return ptr;}
> 
> ?
> 
> Which would sort of make sense to me.

This macro is used mostly on unsigned long since for __user ptr we can
deference them in the kernel even if tagged. So if we are to use types
here, I'd rather have:

static inline unsigned long untagged_addr(unsigned long addr);

In addition I'd like to avoid the explicit casting to (unsigned long)
and use some userptr_to_ulong() or something. We are investigating in
parallel on how to leverage static checking (sparse, smatch) for better
tracking these conversions.

-- 
Catalin


Re: [PATCH v4 05/14] arm64, mm: Make randomization selected by generic topdown mmap layout

2019-06-03 Thread Catalin Marinas
On Sun, May 26, 2019 at 09:47:37AM -0400, Alexandre Ghiti wrote:
> This commits selects ARCH_HAS_ELF_RANDOMIZE when an arch uses the generic
> topdown mmap layout functions so that this security feature is on by
> default.
> Note that this commit also removes the possibility for arm64 to have elf
> randomization and no MMU: without MMU, the security added by randomization
> is worth nothing.

Not planning on this anytime soon ;).

Acked-by: Catalin Marinas 


Re: [PATCH v4 04/14] arm64, mm: Move generic mmap layout functions to mm

2019-06-03 Thread Catalin Marinas
On Sun, May 26, 2019 at 09:47:36AM -0400, Alexandre Ghiti wrote:
> arm64 handles top-down mmap layout in a way that can be easily reused
> by other architectures, so make it available in mm.
> It then introduces a new config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> that can be set by other architectures to benefit from those functions.
> Note that this new config depends on MMU being enabled, if selected
> without MMU support, a warning will be thrown.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Alexandre Ghiti 
> Reviewed-by: Christoph Hellwig 

Acked-by: Catalin Marinas 


Re: [PATCH v4 03/14] arm64: Consider stack randomization for mmap base only when necessary

2019-06-03 Thread Catalin Marinas
On Sun, May 26, 2019 at 09:47:35AM -0400, Alexandre Ghiti wrote:
> Do not offset mmap base address because of stack randomization if
> current task does not want randomization.
> Note that x86 already implements this behaviour.
> 
> Signed-off-by: Alexandre Ghiti 
> Acked-by: Kees Cook 
> Reviewed-by: Christoph Hellwig 

Acked-by: Catalin Marinas 


Re: [PATCH v4 02/14] arm64: Make use of is_compat_task instead of hardcoding this test

2019-06-03 Thread Catalin Marinas
On Sun, May 26, 2019 at 09:47:34AM -0400, Alexandre Ghiti wrote:
> Each architecture has its own way to determine if a task is a compat task,
> by using is_compat_task in arch_mmap_rnd, it allows more genericity and
> then it prepares its moving to mm/.
> 
> Signed-off-by: Alexandre Ghiti 
> Acked-by: Kees Cook 
> Reviewed-by: Christoph Hellwig 

Acked-by: Catalin Marinas 


Re: [PATCH REBASE v2 1/2] x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig

2019-06-03 Thread Catalin Marinas
On Sun, May 26, 2019 at 08:50:37AM -0400, Alexandre Ghiti wrote:
> ARCH_WANT_HUGE_PMD_SHARE config was declared in both architectures:
> move this declaration in arch/Kconfig and make those architectures
> select it.
> 
> Signed-off-by: Alexandre Ghiti 
> Reviewed-by: Palmer Dabbelt 
> ---
>  arch/Kconfig   | 3 +++
>  arch/arm64/Kconfig | 2 +-
>  arch/x86/Kconfig   | 4 +---
>  3 files changed, 5 insertions(+), 4 deletions(-)

For arm64:

Acked-by: Catalin Marinas 


Re: [PATCH v4 2/4] x86: simplify _TIF_SYSCALL_EMU handling

2019-06-03 Thread Catalin Marinas
On Thu, May 23, 2019 at 10:06:16AM +0100, Sudeep Holla wrote:
> The usage of emulated/_TIF_SYSCALL_EMU flags in syscall_trace_enter
> seems to be bit overcomplicated than required. Let's simplify it.
> 
> Cc: Andy Lutomirski 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Acked-by: Oleg Nesterov 
> Signed-off-by: Sudeep Holla 
> ---
>  arch/x86/entry/common.c | 17 ++---
>  1 file changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index a986b3c8294c..0a61705d62ec 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -72,23 +72,18 @@ static long syscall_trace_enter(struct pt_regs *regs)
>  
>   struct thread_info *ti = current_thread_info();
>   unsigned long ret = 0;
> - bool emulated = false;
>   u32 work;
>  
>   if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
>   BUG_ON(regs != task_pt_regs(current));
>  
> - work = READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY;
> + work = READ_ONCE(ti->flags);
>  
> - if (unlikely(work & _TIF_SYSCALL_EMU))
> - emulated = true;
> -
> - if ((emulated || (work & _TIF_SYSCALL_TRACE)) &&
> - tracehook_report_syscall_entry(regs))
> - return -1L;
> -
> - if (emulated)
> - return -1L;
> + if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> + ret = tracehook_report_syscall_entry(regs);
> + if (ret || (work & _TIF_SYSCALL_EMU))
> + return -1L;
> + }

Andy (or the other x86 folk), could I please get an ack on this patch? I
plan to queue this series through the arm64 tree (though if you want to
merge it separately, it looks like an independent clean-up with no
dependencies on the other patches).

Thanks.

-- 
Catalin


Re: [PATCH] arm64/mm: Move PTE_VALID from SW defined to HW page table entry definitions

2019-06-03 Thread Catalin Marinas
On Tue, May 21, 2019 at 09:36:27AM +0530, Anshuman Khandual wrote:
> PTE_VALID signifies that the last level page table entry is valid and it is
> MMU recognized while walking the page table. This is not a software defined
> PTE bit and should not be listed like one. Just move it to appropriate
> header file.
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Steve Capper 
> Cc: Suzuki Poulose 
> Cc: James Morse 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [PATCH] arm64/hugetlb: Use macros for contiguous huge page sizes

2019-06-03 Thread Catalin Marinas
On Tue, May 21, 2019 at 09:05:03AM +0530, Anshuman Khandual wrote:
> Replace all open encoded contiguous huge page size computations with
> available macro encodings CONT_PTE_SIZE and CONT_PMD_SIZE. There are other
> instances where these macros are used in the file and this change makes it
> consistently use the same mnemonic.
> 
> Signed-off-by: Anshuman Khandual 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Steve Capper 
> Cc: Mark Rutland 

Queued for 5.3. Thanks.

-- 
Catalin


Re: [RFC/PATCH 4/5] arm64: Add support for arch_memremap_ro()

2019-06-03 Thread Catalin Marinas
On Fri, May 17, 2019 at 09:47:45AM -0700, Stephen Boyd wrote:
> Pass in PAGE_KERNEL_RO to the underlying IO mapping mechanism to get a
> read-only mapping for the MEMREMAP_RO type of memory mappings that
> memremap() supports.
> 
> Cc: Evan Green 
> Cc: Rob Herring 
> Cc: Bjorn Andersson 
> Cc: Andy Gross 
> Cc: Will Deacon 
> Cc: Catalin Marinas 
> Cc: Dan Williams 
> Signed-off-by: Stephen Boyd 

Not sure what the plans are with this series but if you need an ack for
arm64:

Acked-by: Catalin Marinas 


Re: [PATCH v6 15/19] arm64: Add vDSO compat support

2019-06-01 Thread Catalin Marinas
On Thu, May 30, 2019 at 03:15:27PM +0100, Vincenzo Frascino wrote:
> Add vDSO compat support to the arm64 building system.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Signed-off-by: Vincenzo Frascino 
> ---
>  arch/arm64/Kconfig |  1 +
>  arch/arm64/Makefile| 23 +--
>  arch/arm64/kernel/Makefile |  6 +-
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 952c9f8cf3b8..3e1d4f8347f4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -108,6 +108,7 @@ config ARM64
>   select GENERIC_STRNLEN_USER
>   select GENERIC_TIME_VSYSCALL
>   select GENERIC_GETTIMEOFDAY
> + select GENERIC_COMPAT_VDSO if !CPU_BIG_ENDIAN

This select needs to also depend on COMPAT (or rather be selected from
the COMPAT menuconfig), otherwise, trying to build this series with 64K
pages where COMPAT is disabled, I get:

  VDSOC_GTD   arch/arm64/kernel/vdso32/vgettimeofday.o
  VDSOA   arch/arm64/kernel/vdso32/sigreturn.o
arch/arm64/kernel/vdso32/sigreturn.S: Assembler messages:
arch/arm64/kernel/vdso32/sigreturn.S:25: Error: expected #constant
arch/arm64/kernel/vdso32/sigreturn.S:35: Error: expected #constant
arch/arm64/kernel/vdso32/sigreturn.S:46: Error: expected #constant
arch/arm64/kernel/vdso32/sigreturn.S:56: Error: expected #constant
arch/arm64/kernel/vdso32/sigreturn.S:28: Error: undefined symbol 
__NR_compat_sigreturn used as an immediate value
arch/arm64/kernel/vdso32/sigreturn.S:38: Error: undefined symbol 
__NR_compat_rt_sigreturn used as an immediate value
make[2]: *** [arch/arm64/kernel/vdso32/Makefile:154: 
arch/arm64/kernel/vdso32/sigreturn.o] Error 1
make[2]: *** Waiting for unfinished jobs
In file included from lib/vdso/gettimeofday.c:25:0,
 from :0:
arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 
'gettimeofday_fallback':
arch/arm64/include/asm/vdso/compat_gettimeofday.h:22:31: error: 
'__NR_compat_gettimeofday' undeclared (first use in this function); did you 
mean '__NR_gettimeofday'?
  register long nr asm("r7") = __NR_compat_gettimeofday;
   ^~~~
   __NR_gettimeofday
arch/arm64/include/asm/vdso/compat_gettimeofday.h:22:31: note: each undeclared 
identifier is reported only once for each function it appears in
arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 
'clock_gettime_fallback':
arch/arm64/include/asm/vdso/compat_gettimeofday.h:40:31: error: 
'__NR_compat_clock_gettime64' undeclared (first use in this function); did you 
mean '__NR_clock_gettime'?
  register long nr asm("r7") = __NR_compat_clock_gettime64;
   ^~~
   __NR_clock_gettime
arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 
'clock_getres_fallback':
arch/arm64/include/asm/vdso/compat_gettimeofday.h:58:31: error: 
'__NR_compat_clock_getres_time64' undeclared (first use in this function); did 
you mean '__NR_clock_gettime'?
  register long nr asm("r7") = __NR_compat_clock_getres_time64;
   ^~~
   __NR_clock_gettime
arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_gettime':
arch/arm64/kernel/vdso32/vgettimeofday.c:15:17: error: 'TASK_SIZE_32' 
undeclared (first use in this function); did you mean 'TASK_SIZE_64'?
  if ((u32)ts >= TASK_SIZE_32)
 ^~~~
 TASK_SIZE_64
arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_gettime64':
arch/arm64/kernel/vdso32/vgettimeofday.c:25:17: error: 'TASK_SIZE_32' 
undeclared (first use in this function); did you mean 'TASK_SIZE_64'?
  if ((u32)ts >= TASK_SIZE_32)
 ^~~~
 TASK_SIZE_64
arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_getres':
arch/arm64/kernel/vdso32/vgettimeofday.c:41:18: error: 'TASK_SIZE_32' 
undeclared (first use in this function); did you mean 'TASK_SIZE_64'?
  if ((u32)res >= TASK_SIZE_32)
  ^~~~
  TASK_SIZE_64
make[2]: *** [arch/arm64/kernel/vdso32/Makefile:152: 
arch/arm64/kernel/vdso32/vgettimeofday.o] Error 1
make[1]: *** [arch/arm64/Makefile:182: vdso_prepare] Error 2
make: *** [Makefile:179: sub-make] Error 2

-- 
Catalin


Re: [PATCH v15 05/17] arms64: untag user pointers passed to memory syscalls

2019-05-29 Thread Catalin Marinas
On Wed, May 29, 2019 at 01:42:25PM +0100, Dave P Martin wrote:
> On Tue, May 28, 2019 at 05:34:00PM +0100, Catalin Marinas wrote:
> > On Tue, May 28, 2019 at 04:56:45PM +0100, Dave P Martin wrote:
> > > On Tue, May 28, 2019 at 04:40:58PM +0100, Catalin Marinas wrote:
> > > 
> > > [...]
> > > 
> > > > My thoughts on allowing tags (quick look):
> > > >
> > > > brk - no
> > > 
> > > [...]
> > > 
> > > > mlock, mlock2, munlock - yes
> > > > mmap - no (we may change this with MTE but not for TBI)
> > > 
> > > [...]
> > > 
> > > > mprotect - yes
> > > 
> > > I haven't following this discussion closely... what's the rationale for
> > > the inconsistencies here (feel free to refer me back to the discussion
> > > if it's elsewhere).
> > 
> > _My_ rationale (feel free to disagree) is that mmap() by default would
> > not return a tagged address (ignoring MTE for now). If it gets passed a
> > tagged address or a "tagged NULL" (for lack of a better name) we don't
> > have clear semantics of whether the returned address should be tagged in
> > this ABI relaxation. I'd rather reserve this specific behaviour if we
> > overload the non-zero tag meaning of mmap() for MTE. Similar reasoning
> > for mremap(), at least on the new_address argument (not entirely sure
> > about old_address).
> > 
> > munmap() should probably follow the mmap() rules.
> > 
> > As for brk(), I don't see why the user would need to pass a tagged
> > address, we can't associate any meaning to this tag.
> > 
> > For the rest, since it's likely such addresses would have been tagged by
> > malloc() in user space, we should allow tagged pointers.
> 
> Those arguments seem reasonable.  We should try to capture this
> somewhere when documenting the ABI.
> 
> To be clear, I'm not sure that we should guarantee anywhere that a
> tagged pointer is rejected: rather the behaviour should probably be
> left unspecified.  Then we can tidy it up incrementally.
> 
> (The behaviour is unspecified today, in any case.)

What is specified (or rather de-facto ABI) today is that passing a user
address above TASK_SIZE (e.g. non-zero top byte) would fail in most
cases. If we relax this with the TBI we may end up with some de-facto
ABI before we actually get MTE hardware. Tightening it afterwards may be
slightly more problematic, although MTE needs to be an explicit opt-in.

IOW, I wouldn't want to unnecessarily relax the ABI if we don't need to.

-- 
Catalin


Re: [PATCH 4/6] mm: add a gup_fixup_start_addr hook

2019-05-29 Thread Catalin Marinas
Hi Christoph,

On Sat, 25 May 2019 at 14:33, Christoph Hellwig  wrote:
> diff --git a/mm/gup.c b/mm/gup.c
> index f173fcbaf1b2..1c21ecfbf38b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2117,6 +2117,10 @@ static void gup_pgd_range(unsigned long addr, unsigned 
> long end,
> } while (pgdp++, addr = next, addr != end);
>  }
>
> +#ifndef gup_fixup_start_addr
> +#define gup_fixup_start_addr(start)(start)
> +#endif

As you pointed out in a subsequent reply, we could use the
untagged_addr() macro from Andrey (or a shorter "untag_addr" if you
want it to look like a verb).

>  #ifndef gup_fast_permitted
>  /*
>   * Check if it's allowed to use __get_user_pages_fast() for the range, or
> @@ -2145,7 +2149,7 @@ int __get_user_pages_fast(unsigned long start, int 
> nr_pages, int write,
> unsigned long flags;
> int nr = 0;
>
> -   start &= PAGE_MASK;
> +   start = gup_fixup_start_addr(start) & PAGE_MASK;
> len = (unsigned long) nr_pages << PAGE_SHIFT;
> end = start + len;
>
> @@ -2218,7 +,7 @@ int get_user_pages_fast(unsigned long start, int 
> nr_pages,
> unsigned long addr, len, end;
> int nr = 0, ret = 0;
>
> -   start &= PAGE_MASK;
> +   start = gup_fixup_start_addr(start) & PAGE_MASK;
> addr = start;
> len = (unsigned long) nr_pages << PAGE_SHIFT;
> end = start + len;

In Andrey's patch [1] we don't fix __get_user_pages_fast(), only
__get_user_pages() as it needs to do a find_vma() search. I wonder
whether this is actually necessary for the *_fast() versions. If the
top byte is non-zero (i.e. tagged address), 'end' would also have the
same tag. The page table macros like pgd_index() and pgd_addr_end()
already take care of masking out the top bits (at least for arm64)
since they need to work on kernel address with the top bits all 1. So
gup_pgd_range() should cope with tagged addresses already.

[1] 
https://lore.kernel.org/lkml/d234cd71774f35229bdfc0a793c34d6712b73093.1557160186.git.andreyk...@google.com/

-- 
Catalin


Re: [PATCH v15 00/17] arm64: untag user pointers passed to the kernel

2019-05-23 Thread Catalin Marinas
On Thu, May 23, 2019 at 11:42:57AM +0100, Dave P Martin wrote:
> On Wed, May 22, 2019 at 09:20:52PM -0300, Jason Gunthorpe wrote:
> > On Wed, May 22, 2019 at 02:49:28PM +0100, Dave Martin wrote:
> > > If multiple people will care about this, perhaps we should try to
> > > annotate types more explicitly in SYSCALL_DEFINEx() and ABI data
> > > structures.
> > > 
> > > For example, we could have a couple of mutually exclusive modifiers
> > > 
> > > T __object *
> > > T __vaddr * (or U __vaddr)
> > > 
> > > In the first case the pointer points to an object (in the C sense)
> > > that the call may dereference but not use for any other purpose.
> > 
> > How would you use these two differently?
> > 
> > So far the kernel has worked that __user should tag any pointer that
> > is from userspace and then you can't do anything with it until you
> > transform it into a kernel something
> 
> Ultimately it would be good to disallow casting __object pointers execpt
> to compatible __object pointer types, and to make get_user etc. demand
> __object.
> 
> __vaddr pointers / addresses would be freely castable, but not to
> __object and so would not be dereferenceable even indirectly.

I think it gets too complicated and there are ambiguous cases that we
may not be able to distinguish. For example copy_from_user() may be used
to copy a user data structure into the kernel, hence __object would
work, while the same function may be used to copy opaque data to a file,
so __vaddr may be a better option (unless I misunderstood your
proposal).

We currently have T __user * and I think it's a good starting point. The
prior attempt [1] was shut down because it was just hiding the cast
using __force. We'd need to work through those cases again and rather
start changing the function prototypes to avoid unnecessary casting in
the callers (e.g. get_user_pages(void __user *) or come up with a new
type) while changing the explicit casting to a macro where it needs to
be obvious that we are converting a user pointer, potentially typed
(tagged), to an untyped address range. We may need a user_ptr_to_ulong()
macro or similar (it seems that we have a u64_to_user_ptr, wasn't aware
of it).

It may actually not be far from what you suggested but I'd keep the
current T __user * to denote possible dereference.

[1] 
https://lore.kernel.org/lkml/5d54526e5ff2e5ad63d0dfdd9ab17cf359afa4f2.1535629099.git.andreyk...@google.com/

-- 
Catalin


Re: [PATCH] kmemleak: fix check for softirq context

2019-05-18 Thread Catalin Marinas
On Sat, May 18, 2019 at 09:10:59AM +0200, Dmitry Vyukov wrote:
> On Fri, May 17, 2019 at 11:37 PM Andrew Morton
>  wrote:
> > On Fri, 17 May 2019 19:15:07 +0200 Dmitry Vyukov  wrote:
> >
> > > From: Dmitry Vyukov 
> > >
> > > in_softirq() is a wrong predicate to check if we are in a softirq context.
> > > It also returns true if we have BH disabled, so objects are falsely
> > > stamped with "softirq" comm. The correct predicate is 
> > > in_serving_softirq().
> > >
> > > ...
> > >
> > > --- a/mm/kmemleak.c
> > > +++ b/mm/kmemleak.c
> > > @@ -588,7 +588,7 @@ static struct kmemleak_object *create_object(unsigned 
> > > long ptr, size_t size,
> > >   if (in_irq()) {
> > >   object->pid = 0;
> > >   strncpy(object->comm, "hardirq", sizeof(object->comm));
> > > - } else if (in_softirq()) {
> > > + } else if (in_serving_softirq()) {
> > >   object->pid = 0;
> > >   strncpy(object->comm, "softirq", sizeof(object->comm));
> > >   } else {
> >
> > What are the user-visible runtime effects of this change?
> 
> If user does cat from /sys/kernel/debug/kmemleak previously they would
> see this, which is clearly wrong, this is system call context (see the
> comm):

Indeed, with your patch you get the correct output.

Acked-by: Catalin Marinas 

Thanks.

-- 
Catalin


Re: [PATCH] arch64: export __flush_dcache_area

2019-05-17 Thread Catalin Marinas
On Fri, May 17, 2019 at 12:59:56PM -0700, Mark Salyzyn wrote:
> Some (out of tree modular) drivers feel a need to ensure
> data is flushed to the DDR before continuing flow.
> 
> Signed-off-by: Mark Salyzyn 
> Cc: linux-kernel@vger.kernel.org
> Cc: kernel-t...@android.com
> ---
>  arch/arm64/mm/cache.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index a194fd0e837f..70d7cb5c0bd2 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -120,6 +120,7 @@ ENTRY(__flush_dcache_area)
>   dcache_by_line_op civac, sy, x0, x1, x2, x3
>   ret
>  ENDPIPROC(__flush_dcache_area)
> +EXPORT_SYMBOL_GPL(__flush_dcache_area)
>  
>  /*
>   *   __clean_dcache_area_pou(kaddr, size)

NAK. Such drivers are doing something wrong, there is a dedicated
in-kernel API for that handles kernel maintenance (hint: DMA).

-- 
Catalin


Re: icache_is_aliasing and big.LITTLE

2019-05-09 Thread Catalin Marinas
Hi,

On Wed, May 08, 2019 at 11:45:03AM -0700, Salman Qazi wrote:
> What is the intention behind icache_is_aliasing on big.LITTLE systems
> where some icaches are VIPT and others are PIPT? Is it meant to be
> conservative in some sense or should it be made per-CPU?

It needs to cover the worst case scenario across all CPUs, i.e. aliasing
VIPT if one of the CPUs has this. We can't make it per-CPU because a
thread performing cache maintenance might be migrated to another CPU
with different cache policy (e.g. sync_icache_aliases()).

-- 
Catalin


Re: [PATCH v14 13/17] IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 03:25:09PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> only by done with untagged pointers.
> 
> Untag user pointers in this function.
> 
> Signed-off-by: Andrey Konovalov 
> Reviewed-by: Leon Romanovsky 
> ---
>  drivers/infiniband/hw/mlx4/mr.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> index 395379a480cb..9a35ed2c6a6f 100644
> --- a/drivers/infiniband/hw/mlx4/mr.c
> +++ b/drivers/infiniband/hw/mlx4/mr.c
> @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> *udata, u64 start,
>* again
>*/
>   if (!ib_access_writable(access_flags)) {
> + unsigned long untagged_start = untagged_addr(start);
>   struct vm_area_struct *vma;
>  
>   down_read(>mm->mmap_sem);
> @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> *udata, u64 start,
>* cover the memory, but for now it requires a single vma to
>* entirely cover the MR to support RO mappings.
>*/
> - vma = find_vma(current->mm, start);
> - if (vma && vma->vm_end >= start + length &&
> - vma->vm_start <= start) {
> + vma = find_vma(current->mm, untagged_start);
> + if (vma && vma->vm_end >= untagged_start + length &&
> + vma->vm_start <= untagged_start) {
>   if (vma->vm_flags & VM_WRITE)
>   access_flags |= IB_ACCESS_LOCAL_WRITE;
>   } else {

Discussion ongoing on the previous version of the patch but I'm more
inclined to do this in ib_uverbs_(re)reg_mr() on cmd.start.

-- 
Catalin


Re: [PATCH v14 10/17] fs, arm64: untag user pointers in fs/userfaultfd.c

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 03:25:06PM +0200, Andrey Konovalov wrote:
> This patch is a part of a series that extends arm64 kernel ABI to allow to
> pass tagged user pointers (with the top byte set to something else other
> than 0x00) as syscall arguments.
> 
> userfaultfd_register() and userfaultfd_unregister() use provided user
> pointers for vma lookups, which can only by done with untagged pointers.
> 
> Untag user pointers in these functions.
> 
> Signed-off-by: Andrey Konovalov 
> ---
>  fs/userfaultfd.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index f5de1e726356..fdee0db0e847 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1325,6 +1325,9 @@ static int userfaultfd_register(struct userfaultfd_ctx 
> *ctx,
>   goto out;
>   }
>  
> + uffdio_register.range.start =
> + untagged_addr(uffdio_register.range.start);
> +
>   ret = validate_range(mm, uffdio_register.range.start,
>uffdio_register.range.len);
>   if (ret)
> @@ -1514,6 +1517,8 @@ static int userfaultfd_unregister(struct 
> userfaultfd_ctx *ctx,
>   if (copy_from_user(_unregister, buf, sizeof(uffdio_unregister)))
>   goto out;
>  
> + uffdio_unregister.start = untagged_addr(uffdio_unregister.start);
> +
>   ret = validate_range(mm, uffdio_unregister.start,
>uffdio_unregister.len);
>   if (ret)

Wouldn't it be easier to do this in validate_range()? There are a few
more calls in this file, though I didn't check whether a tagged address
would cause issues.

-- 
Catalin


Re: implement generic dma_map_ops for IOMMUs v4

2019-05-03 Thread Catalin Marinas
On Thu, May 02, 2019 at 03:22:08PM +0200, Christoph Hellwig wrote:
> can you quickly look over the arm64 parts?  I'd really like to still
> get this series in for this merge window as it would conflict with
> a lot of dma-mapping work for next merge window, and we also have
> the amd and possibly intel iommu conversions to use it waiting.

Done. They look fine to me.

-- 
Catalin


Re: [PATCH v13 16/20] IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr

2019-04-30 Thread Catalin Marinas
(trimmed down the cc list slightly as the message bounces)

On Mon, Apr 29, 2019 at 09:09:15PM +0300, Leon Romanovsky wrote:
> On Wed, Mar 20, 2019 at 03:51:30PM +0100, Andrey Konovalov wrote:
> > This patch is a part of a series that extends arm64 kernel ABI to allow to
> > pass tagged user pointers (with the top byte set to something else other
> > than 0x00) as syscall arguments.
> >
> > mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can
> > only by done with untagged pointers.
> >
> > Untag user pointers in this function.
> >
> > Signed-off-by: Andrey Konovalov 
> > ---
> >  drivers/infiniband/hw/mlx4/mr.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx4/mr.c 
> > b/drivers/infiniband/hw/mlx4/mr.c
> > index 395379a480cb..9a35ed2c6a6f 100644
> > --- a/drivers/infiniband/hw/mlx4/mr.c
> > +++ b/drivers/infiniband/hw/mlx4/mr.c
> > @@ -378,6 +378,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> > *udata, u64 start,
> >  * again
> >  */
> > if (!ib_access_writable(access_flags)) {
> > +   unsigned long untagged_start = untagged_addr(start);
> > struct vm_area_struct *vma;
> >
> > down_read(>mm->mmap_sem);
> > @@ -386,9 +387,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata 
> > *udata, u64 start,
> >  * cover the memory, but for now it requires a single vma to
> >  * entirely cover the MR to support RO mappings.
> >  */
> > -   vma = find_vma(current->mm, start);
> > -   if (vma && vma->vm_end >= start + length &&
> > -   vma->vm_start <= start) {
> > +   vma = find_vma(current->mm, untagged_start);
> > +   if (vma && vma->vm_end >= untagged_start + length &&
> > +   vma->vm_start <= untagged_start) {
> > if (vma->vm_flags & VM_WRITE)
> > access_flags |= IB_ACCESS_LOCAL_WRITE;
> > } else {
> > --
> 
> Thanks,
> Reviewed-by: Leon Romanovsky 

Thanks for the review.

> Interesting, the followup question is why mlx4 is only one driver in IB which
> needs such code in umem_mr. I'll take a look on it.

I don't know. Just using the light heuristics of find_vma() shows some
other places. For example, ib_umem_odp_get() gets the umem->address via
ib_umem_start(). This was previously set in ib_umem_get() as called from
mlx4_get_umem_mr(). Should the above patch have just untagged "start" on
entry?

BTW, what's the provenience of such "start" address here? Is it
something that the user would have malloc()'ed? We try to impose some
restrictions one what is allowed to be tagged in user so that we don't
have to untag the addresses in the kernel. For example, if it was the
result of an mmap() on the device file, we don't allow tagging.

Thanks.

-- 
Catalin


[GIT PULL] arm64 fixes for 5.1-rc7

2019-04-26 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 fixes below. Thanks.

The following changes since commit 085b7755808aa11f78ab9377257e1dad2e6fa4bb:

  Linux 5.1-rc6 (2019-04-21 10:45:57 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to 4e69ecf4da1ee0b2ac735e1f1bb13935acd5a38d:

  arm64/module: ftrace: deal with place relative nature of PLTs (2019-04-23 
13:35:00 +0100)


arm64 fixes:

- keep the tail of an unaligned initrd reserved

- adjust ftrace_make_call() to deal with the relative nature of PLTs


Ard Biesheuvel (1):
  arm64/module: ftrace: deal with place relative nature of PLTs

Bjorn Andersson (1):
  arm64: mm: Ensure tail of unaligned initrd is reserved

 arch/arm64/kernel/ftrace.c | 9 +++--
 arch/arm64/mm/init.c   | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

-- 
Catalin


Re: [PATCH v13 04/20] mm, arm64: untag user pointers passed to memory syscalls

2019-04-26 Thread Catalin Marinas
On Tue, Apr 02, 2019 at 02:47:34PM +0200, Andrey Konovalov wrote:
> On Fri, Mar 29, 2019 at 11:30 AM Catalin Marinas
>  wrote:
> > On Thu, Mar 28, 2019 at 02:19:34PM -0400, Steven Rostedt wrote:
> > > On Thu, 28 Mar 2019 19:10:07 +0100
> > > Andrey Konovalov  wrote:
> > >
> > > > > > Signed-off-by: Andrey Konovalov 
> > > > > > ---
> > > > > >  ipc/shm.c  | 2 ++
> > > > > >  mm/madvise.c   | 2 ++
> > > > > >  mm/mempolicy.c | 5 +
> > > > > >  mm/migrate.c   | 1 +
> > > > > >  mm/mincore.c   | 2 ++
> > > > > >  mm/mlock.c | 5 +
> > > > > >  mm/mmap.c  | 7 +++
> > > > > >  mm/mprotect.c  | 1 +
> > > > > >  mm/mremap.c| 2 ++
> > > > > >  mm/msync.c | 2 ++
> > > > > >  10 files changed, 29 insertions(+)
> > > > >
> > > > > I wonder whether it's better to keep these as wrappers in the arm64
> > > > > code.
> > > >
> > > > I don't think I understand what you propose, could you elaborate?
> > >
> > > I believe Catalin is saying that instead of placing things like:
> > >
> > > @@ -1593,6 +1593,7 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, 
> > > shmaddr, int, shmflg)
> > >   unsigned long ret;
> > >   long err;
> > >
> > > + shmaddr = untagged_addr(shmaddr);
> > >
> > > To instead have the shmaddr set to the untagged_addr() before calling
> > > the system call, and passing the untagged addr to the system call, as
> > > that goes through the arm64 architecture specific code first.
> >
> > Indeed. For example, we already have a SYSCALL_DEFINE6(mmap, ...) in
> > arch/arm64/kernel/sys.c, just add the untagging there. We could do
> > something similar for the other syscalls. I don't mind doing this in the
> > generic code but if it's only needed for arm64, I'd rather keep the
> > generic changes to a minimum.
> 
> Do I understand correctly, that I'll need to add ksys_ wrappers for
> each of the memory syscalls, and then redefine them in
> arch/arm64/kernel/sys.c with arm64_ prefix, like it is done for the
> personality syscall right now? This will require generic changes as
> well.

Yes. My aim is to keep the number of untagged_addr() calls in the
generic code to a minimum (rather than just keeping the generic code
changes small).

-- 
Catalin


Re: [PATCH] x86/mm: fix a crash with kmemleak_scan()

2019-04-24 Thread Catalin Marinas
On Tue, Apr 23, 2019 at 12:58:11PM -0400, Qian Cai wrote:
> The first kmemleak_scan() after boot would trigger a crash below because
> 
> kernel_init
>   free_initmem
> mem_encrypt_free_decrypted_mem
>   free_init_pages
> 
> unmapped some memory inside the .bss with DEBUG_PAGEALLOC=y. Since
> kmemleak_init() will register the .data/.bss sections (only register
> .data..ro_after_init if not within .data) and then kmemleak_scan() will
> scan those address and dereference them looking for pointer referencing.
> If free_init_pages() free and unmap pages in those sections,
> kmemleak_scan() will trigger a crash if referencing one of those
> addresses.
> 
> BUG: unable to handle kernel paging request at bd402000
> CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
> RIP: 0010:scan_block+0x58/0x160
> Call Trace:
>  scan_gray_list+0x1d9/0x280
>  kmemleak_scan+0x485/0xad0
>  kmemleak_scan_thread+0x9f/0xc4
>  kthread+0x1d2/0x1f0
>  ret_from_fork+0x35/0x40
> 
> Since kmemleak_free_part() is tolerant to unknown objects (not tracked by
> kmemleak), it is fine to call it from free_init_pages() even if not all
> address ranges passed to this function are known to kmemleak.
> 
> Signed-off-by: Qian Cai 

Reviewed-by: Catalin Marinas 


Re: [PATCH] x86/mm/mem_encrypt: fix a crash with kmemleak_scan

2019-04-23 Thread Catalin Marinas
Hi Boris,

On Tue, Apr 23, 2019 at 03:25:18PM +0200, Borislav Petkov wrote:
> On Thu, Apr 18, 2019 at 10:50:15AM +0100, Catalin Marinas wrote:
> > Kmemleak is basically a tri-colour marking tracing garbage collector [1]
> 
> Thanks for that - interesting read.
> 
> > but without automatic freeing. It relies on a graph of references
> > (pointers) between various objects and the root of such graph is the
> > .bss/.data.

Sorry for the misleading information here, the root of the graph was
changed recently (see below).

> > free_init_pages() is called on memory regions outside .bss/.data like
> > smp_locks, initrd, kernel init text which kmemleak does not track
> > anyway. That said, kmemleak_free_part() is tolerant to unknown pointers,
> > so we could call it directly from free_init_pages().
> 
> Ok, lemme think out loud for a bit here: kmemleak_scan() goes over
> an object list and I guess in our particular case, the memory which
> got freed in mem_encrypt_free_decrypted_mem() *was* in that list too,
> leading to the crash.

Yes.

> Looking at the splat, it is in scan_gray_list() which would mean that
> the object we freed was reachable from the root(s) in .bss.

The .bss/.data used to be root until recently when commit 298a32b13208
("kmemleak: powerpc: skip scanning holes in the .bss section") changed
this to accommodate a similar problem on powerpc. With this commit,
.bss/.data are traced objects but painted "grey" by default so that they
will be always scanned, pretty much like the root (and they can't
"leak").

In Qian's splat, the unmapped area was actually in the .bss which is now
a traced object (no longer a root one). In his previous report on
powerpc [1], the splat was in scan_large_block().

> Now, the docs say:
> 
> "The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`,
> :c:func:`kmem_cache_alloc` and
> friends are traced and the pointers, together with additional
> information like size and stack trace, are stored in a rbtree."
> 
> So I guess free_init_pages() should be somehow telling kmemleak, "hey,
> just freed that object, pls adjust your tracking lists" no?
> 
> Because, otherwise, if we start sprinkling those kmemleak_free_part()
> calls everywhere, that'll quickly turn into a game of whack-a-mole. And
> we don't need that especially if kmemleak can easily be taught to handle
> such cases.

Object freeing is tracked in general via the corresponding kfree(),
vfree() etc. and don't need special handling. The .bss doesn't have this
alloc/free symmetry and not freeing it all either, hence this
workaround to register it as a traced object and allow partial freeing.

Anyway, I agree with you. As I mentioned in the previous email,
kmemleak_free_part() is tolerant to unknown objects (not tracked by
kmemleak), so I'm fine with calling it from free_init_pages() even if
not all address ranges passed to this function are known to kmemleak.

> > There is Documentation/dev-tools/kmemleak.rst briefly mentioning the
> > tracing garbage collector (although the Wikipedia link is no longer
> > valid, it should be replaced with [1]). It doesn't, however, state why
> > .bss/.data are special.
> 
> The fact that they're special is important info, I'd say.

I took a note to improve this when I get some time.

> > [1] 
> > https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri-color_marking
> 
> is nice. While reading, it made me think how our discussion would go if
> we didn't have wikipedia. You'd probably say, go to the library and read
> this and that section in this and that book on tri-color marking. :-)

There are probably some academic papers published somewhere ;). But
wikipedia makes things much easier (and free).

-- 
Catalin

[1] http://lkml.kernel.org/r/20190312191412.28656-1-...@lca.pw


[GIT PULL] arm64 fixes for 5.1-rc6

2019-04-18 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 fix below. Thanks.

The following changes since commit dc4060a5dc2557e6b5aa813bf5b73677299d62d2:

  Linux 5.1-rc5 (2019-04-14 15:17:41 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to ff8acf929014b7f87315588e0daf8597c8aa9d1c:

  arm64: futex: Restore oldval initialization to work around buggy compilers 
(2019-04-18 18:17:08 +0100)


Avoid compiler uninitialised warning introduced by recent arm64 futex fix


Nathan Chancellor (1):
  arm64: futex: Restore oldval initialization to work around buggy compilers

 arch/arm64/include/asm/futex.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
Catalin


Re: [PATCH] x86/mm/mem_encrypt: fix a crash with kmemleak_scan

2019-04-18 Thread Catalin Marinas
On Thu, Apr 18, 2019 at 09:45:10AM +0200, Borislav Petkov wrote:
> On Tue, Apr 16, 2019 at 08:38:30PM -0400, Qian Cai wrote:
> > kmemleak_init() will register the data/bss sections (only register
> > .data..ro_after_init if not within .data) and then kmemleak_scan() will scan
> > those address and dereference them looking for pointer referencing. If
> > free_init_pages() free and unmap pages in those sections, kmemleak_scan() 
> > will
> > trigger a crash if referencing one of those addresses.
> >
> > I checked other x86 free_init_pages() call sites and don't see anything 
> > obvious
> > where another place to free an address in those sections.
> 
> And why is .bss/.data special and why does it need that special handling
> by kmemleak?

Kmemleak is basically a tri-colour marking tracing garbage collector [1]
but without automatic freeing. It relies on a graph of references
(pointers) between various objects and the root of such graph is the
.bss/.data.

free_init_pages() is called on memory regions outside .bss/.data like
smp_locks, initrd, kernel init text which kmemleak does not track
anyway. That said, kmemleak_free_part() is tolerant to unknown pointers,
so we could call it directly from free_init_pages().

> There must be some rule or a heuristic why kmemleak does that. Is that
> documented somewhere?

There is Documentation/dev-tools/kmemleak.rst briefly mentioning the
tracing garbage collector (although the Wikipedia link is no longer
valid, it should be replaced with [1]). It doesn't, however, state why
.bss/.data are special.

-- 
Catalin

[1] https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri-color_marking


Re: [PATCH 1/1] arm: mm: Export __sync_icache_dcache() for xen-privcmd

2019-04-17 Thread Catalin Marinas
On Wed, Apr 17, 2019 at 10:41:37AM +0100, Russell King wrote:
> On Sun, Apr 14, 2019 at 10:51:09PM -0700, Christoph Hellwig wrote:
> > On Sat, Apr 13, 2019 at 06:30:52PM +0200, Heinrich Schuchardt wrote:
> > > This patch avoids
> > > ERROR: "__sync_icache_dcache" [drivers/xen/xen-privcmd.ko] undefined!
> > > observed when compiling v4.19.34.
> > > 
> > > The xen-privcmd driver, which can be modular, calls set_pte_at()
> > > which in turn may call __sync_icache_dcache().
> > 
> > Maybe that really is a sign that it should not be modular..
> 
> This issue has been discussed several times, and this URL gives a list
> of all messages on linux-arm-kernel that mention __sync_icache_dcache:
> 
> https://archive.armlinux.org.uk/lurker/search/20380101.00.@ml:linux-arm-kernel,sb:__sync_icache_dcache.en.html
> 
> At the beginning of March, Boris Ostrovsky pointed that another solution
> is available that does not need architecture private symbols to be
> exported.
> 
> Since I've already said in a previous thread that I don't want this
> function exported, it seems the way forward is pretty obvious.

Arnd submitted a patch, not sure what happened to it:

https://lore.kernel.org/lkml/20190304204826.2414365-1-a...@arndb.de/

I think this boils down to whether set_pte_at() is allowed to be called
from loadable modules. If yes, __sync_icache_dcache() needs exporting,
otherwise Arnd's patch makes the relevant code built-in (and, if merged,
I'm happy to revert the similar export on arm64).

-- 
Catalin


Re: [PATCH] arm64: futex: Restore oldval initialization to work around buggy compilers

2019-04-17 Thread Catalin Marinas
On Wed, Apr 17, 2019 at 12:21:21AM -0700, Nathan Chancellor wrote:
> Commit 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with
> non-zero result value") removed oldval's zero initialization in
> arch_futex_atomic_op_inuser because it is not necessary. Unfortunately,
> Android's arm64 GCC 4.9.4 [1] does not agree:
> 
> ../kernel/futex.c: In function 'do_futex':
> ../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>return oldval == cmparg;
>  ^
> In file included from ../kernel/futex.c:73:0:
> ../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here
>   int oldval, ret, tmp;
>   ^
> 
> GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser
> returns right away, avoiding the uninitialized use that it claims.
> Restoring the zero initialization works around this issue.
> 
> [1]: 
> https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/
> 
> Cc: sta...@vger.kernel.org
> Fixes: 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with 
> non-zero result value")
> Signed-off-by: Nathan Chancellor 

Thanks. That compiler version doesn't seem very smart. I'll queue this
for -rc6.

-- 
Catalin


Re: [PATCH] x86/mm/mem_encrypt: fix a crash with kmemleak_scan

2019-04-16 Thread Catalin Marinas
On Tue, Apr 09, 2019 at 12:05:02AM -0400, Qian Cai wrote:
> The first kmemleak_scan() after boot would trigger a crash below because
> 
> kernel_init
>   free_initmem
> mem_encrypt_free_decrypted_mem
>   free_init_pages
> 
> unmapped some memory inside the .bss.
> 
> BUG: unable to handle kernel paging request at bd402000
> CPU: 12 PID: 325 Comm: kmemleak Not tainted 5.1.0-rc4+ #4
> RIP: 0010:scan_block+0x58/0x160
> Call Trace:
>  scan_gray_list+0x1d9/0x280
>  kmemleak_scan+0x485/0xad0
>  kmemleak_scan_thread+0x9f/0xc4
>  kthread+0x1d2/0x1f0
>  ret_from_fork+0x35/0x40
> 
> Signed-off-by: Qian Cai 

It seems that commit 298a32b13208 ("kmemleak: powerpc: skip scanning
holes in the .bss section") has other uses as well.

Acked-by: Catalin Marinas 


Re: [PATCH] kmemleak: fix unused-function warning

2019-04-16 Thread Catalin Marinas
On Tue, Apr 16, 2019 at 02:31:24PM +0200, Arnd Bergmann wrote:
> The only references outside of the #ifdef have been removed,
> so now we get a warning in non-SMP configurations:
> 
> mm/kmemleak.c:1404:13: error: unused function 'scan_large_block' 
> [-Werror,-Wunused-function]
> 
> Add a new #ifdef around it.
> 
> Fixes: 298a32b13208 ("kmemleak: powerpc: skip scanning holes in the .bss 
> section")
> Signed-off-by: Arnd Bergmann 

Acked-by: Catalin Marinas 


Re: [PATCH] arm64: mm: check virtual addr in virt_to_page() if CONFIG_DEBUG_VIRTUAL=y

2019-04-16 Thread Catalin Marinas
On Tue, Apr 16, 2019 at 01:36:36AM +0800, Miles Chen wrote:
> This change uses the original virt_to_page() (the one with __pa()) to
> check the given virtual address if CONFIG_DEBUG_VIRTUAL=y.
> 
> Recently, I worked on a bug: a driver passes a symbol address to
> dma_map_single() and the virt_to_page() (called by dma_map_single())
> does not work for non-linear addresses after commit 9f2875912dac
> ("arm64: mm: restrict virt_to_page() to the linear mapping").
> 
> I tried to trap the bug by enabling CONFIG_DEBUG_VIRTUAL but it
> did not work - bacause the commit removes the __pa() from
> virt_to_page() but CONFIG_DEBUG_VIRTUAL checks the virtual address
> in __pa()/__virt_to_phys().
> 
> A simple solution is to use the original virt_to_page()
> (the one with__pa()) if CONFIG_DEBUG_VIRTUAL=y.
> 
> Signed-off-by: Miles Chen 
> Cc: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/memory.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 290195168bb3..2cb8248fa2c8 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -302,7 +302,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>   */
>  #define ARCH_PFN_OFFSET  ((unsigned long)PHYS_PFN_OFFSET)
>  
> -#ifndef CONFIG_SPARSEMEM_VMEMMAP
> +#if !defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_DEBUG_VIRTUAL)
>  #define virt_to_page(kaddr)  pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>  #define _virt_addr_valid(kaddr)  pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
>  #else

IIUC, this shouldn't change the behaviour of virt_addr_valid(). The
patch looks fine to me.

Acked-by: Catalin Marinas 


[GIT PULL] arm64 fixes for -rc4

2019-04-05 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 fix below.

The following changes since commit 79a3aaa7b82e3106be97842dedfd8429248896e6:

  Linux 5.1-rc3 (2019-03-31 14:39:29 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to 1c41860864c8ae0387ef7d44fe99cbb2e06d:

  arm64: fix wrong check of on_sdei_stack in nmi context (2019-04-04 16:02:25 
+0100)


Fix unwind_frame() in the context of pseudo NMI.


Wei Li (1):
  arm64: fix wrong check of on_sdei_stack in nmi context

 arch/arm64/kernel/sdei.c | 6 ++
 1 file changed, 6 insertions(+)

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-04-05 Thread Catalin Marinas
On Mon, Apr 01, 2019 at 10:12:01PM +0200, Michal Hocko wrote:
> On Fri 29-03-19 16:16:38, Catalin Marinas wrote:
> > On Fri, Mar 29, 2019 at 01:02:37PM +0100, Michal Hocko wrote:
> > > On Thu 28-03-19 14:59:17, Catalin Marinas wrote:
> > > [...]
> > > > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001
> > > > From: Catalin Marinas 
> > > > Date: Thu, 28 Mar 2019 13:26:07 +
> > > > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak 
> > > > objects
> > > > 
> > > > This patch adds mempool allocations for struct kmemleak_object and
> > > > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc()
> > > > under memory pressure. The patch also masks out all the gfp flags passed
> > > > to kmemleak other than GFP_KERNEL|GFP_ATOMIC.
> > > 
> > > Using mempool allocator is better than inventing its own implementation
> > > but there is one thing to be slightly careful/worried about.
> > > 
> > > This allocator expects that somebody will refill the pool in a finit
> > > time. Most users are OK with that because objects in flight are going
> > > to return in the pool in a relatively short time (think of an IO) but
> > > kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral
> > > allocations are happening all the time so there should be some churn
> > > in the pool all the time but if we go to an extreme where there is a
> > > serious memory leak then I suspect we might get stuck here without any
> > > way forward. Page/slab allocator would eventually back off even though
> > > small allocations never fail because a user context would get killed
> > > sooner or later but there is no fatal_signal_pending backoff in the
> > > mempool alloc path.
> > 
> > We could improve the mempool code slightly to refill itself (from some
> > workqueue or during a mempool_alloc() which allows blocking) but it's
> > really just a best effort for a debug tool under OOM conditions. It may
> > be sufficient just to make the mempool size tunable (via
> > /sys/kernel/debug/kmemleak).
> 
> The point I've tried to make is that you really have to fail at some
> point but mempool is fundamentally about non-failing as long as the
> allocation is sleepable. And we cannot really break that assumptions
> because existing users really depend on it. But as I've said I would try
> it out and see. This is just a debugging feature and I assume that a
> really fatal oom caused by a real memory leak would be detected sooner
> than the whole thing just blows up.

I'll first push a patch to use mempool as it is, with a tunable size via
/sys/kernel/debug/kmemleak. I think the better solution would be a
rewrite of the metadata handling in kmemleak to embed it into the slab
object (as per Pekka's suggestion). However, I'll be on holiday until
the 15th, so cannot look into this.

Thanks.

-- 
Catalin


Re: [PATCH 4.4 078/131] PM / Hibernate: Call flush_icache_range() on pages restored in-place

2019-04-03 Thread Catalin Marinas
On Mon, Apr 01, 2019 at 10:39:18PM +0200, Pavel Machek wrote:
> On Mon 2019-04-01 19:02:28, Greg Kroah-Hartman wrote:
> > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > [ Upstream commit f6cf0545ec697ddc278b7457b7d0c0d86a2ea88e ]
> > 
> > Some architectures require code written to memory as if it were data to be
> > 'cleaned' from any data caches before the processor can fetch them as new
> > instructions.
> > 
> > During resume from hibernate, the snapshot code copies some pages directly,
> > meaning these architectures do not get a chance to perform their cache
> > maintenance. Modify the read and decompress code to call
> > flush_icache_range() on all pages that are restored, so that the restored
> > in-place pages are guaranteed to be executable on these architectures.
> > 
> > Signed-off-by: James Morse 
> > Acked-by: Pavel Machek 
> > Acked-by: Rafael J. Wysocki 
> > Acked-by: Catalin Marinas 
> > [will: make clean_pages_on_* static and remove initialisers]
> > Signed-off-by: Will Deacon 
> > Signed-off-by: Sasha Levin 
> 
> I don't think this is suitable for stable.
> 
> Catalin: Are there platforms that a) need this and b) support
> hibernation in 4.4.X?

Good point. This commit was required for the arm64 hibernate support
that went in the 4.7 kernel.

Greg, why was this patch selected for stable? It's harmless anyway.

Thanks.

-- 
Catalin



Re: [PATCH] arm64: mm: fix max_mapnr is assigned the wrong value

2019-04-01 Thread Catalin Marinas
On Mon, Apr 01, 2019 at 10:21:58PM +0800, Muchun Song wrote:
> Thanks for your reply.  In fact, I didn't hit any problem.  I just read the 
> code
> and found the problem. I see max_mapnr is to only be used in the
> generic pfn_valid().
> As you said, we do not use it on arm64.  So in a sense, the patch is
> meaningless.
> 
> But I think since it is a problem, why not fix it (Even if max_mapnr
> is not be used)?
> What is your opinion?

I'm ok with fixing, only that it's not urgent. Will can pick it up for
the 5.2 merging window.

Reviewed-by: Catalin Marinas 


Re: [PATCH] arm64: mm: fix max_mapnr is assigned the wrong value

2019-04-01 Thread Catalin Marinas
On Sat, Mar 30, 2019 at 09:13:46PM +0800, Muchun Song wrote:
> When we not use flat memory, the mem_map will be NULL and
> pfn_to_page(max_pfn) is a pointer which is located in kernel space. So
> max_mapnr is assigned a very large number(e.g., 0x_) - fix
> it.
> 
> Signed-off-by: Muchun Song 
> ---
>  arch/arm64/mm/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index bc02818fa48b..e86c21a44c88 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -535,7 +535,7 @@ void __init mem_init(void)
>   else
>   swiotlb_force = SWIOTLB_NO_FORCE;
>  
> - set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
> + set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
>  
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>   free_unused_memmap();

The patch looks fine but did you actually hit any problem? max_mapnr
seems to only be used in the generic pfn_valid() which we do not use on
arm64 (just wondering if it needs a cc stable; it doesn't look like as
it's not a regression).

-- 
Catalin


[GIT PULL] arm64 fixes for 5.1-rc3

2019-03-29 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 fix below. Thanks.

The following changes since commit 8c2ffd9174779014c3fe1f96d9dc3641d9175f00:

  Linux 5.1-rc2 (2019-03-24 14:02:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to 9e0a17db517d83d568bb7fa983b54759d4e34f1f:

  arm64: replace memblock_alloc_low with memblock_alloc (2019-03-27 18:12:41 
+)


Use memblock_alloc() instead of memblock_alloc_low() in
request_standard_resources(), the latter being limited to the low 4G
memory range on arm64.


Chen Zhou (1):
  arm64: replace memblock_alloc_low with memblock_alloc

 arch/arm64/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-29 Thread Catalin Marinas
On Fri, Mar 29, 2019 at 01:02:37PM +0100, Michal Hocko wrote:
> On Thu 28-03-19 14:59:17, Catalin Marinas wrote:
> [...]
> > >From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001
> > From: Catalin Marinas 
> > Date: Thu, 28 Mar 2019 13:26:07 +
> > Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects
> > 
> > This patch adds mempool allocations for struct kmemleak_object and
> > kmemleak_scan_area as slightly more resilient than kmem_cache_alloc()
> > under memory pressure. The patch also masks out all the gfp flags passed
> > to kmemleak other than GFP_KERNEL|GFP_ATOMIC.
> 
> Using mempool allocator is better than inventing its own implementation
> but there is one thing to be slightly careful/worried about.
> 
> This allocator expects that somebody will refill the pool in a finit
> time. Most users are OK with that because objects in flight are going
> to return in the pool in a relatively short time (think of an IO) but
> kmemleak is not guaranteed to comply with that AFAIU. Sure ephemeral
> allocations are happening all the time so there should be some churn
> in the pool all the time but if we go to an extreme where there is a
> serious memory leak then I suspect we might get stuck here without any
> way forward. Page/slab allocator would eventually back off even though
> small allocations never fail because a user context would get killed
> sooner or later but there is no fatal_signal_pending backoff in the
> mempool alloc path.

We could improve the mempool code slightly to refill itself (from some
workqueue or during a mempool_alloc() which allows blocking) but it's
really just a best effort for a debug tool under OOM conditions. It may
be sufficient just to make the mempool size tunable (via
/sys/kernel/debug/kmemleak).

> Anyway, I believe this is a step in the right direction and should the
> above ever materializes as a relevant problem we can tune the mempool
> to backoff for _some_ callers or do something similar.
> 
> Btw. there is kmemleak_update_trace call in mempool_alloc, is this ok
> for the kmemleak allocation path?

It's not a problem, maybe only a small overhead in searching an rbtree
in kmemleak but it cannot find anything since the kmemleak metadata is
not tracked. And this only happens if a normal allocation fails and
takes an existing object from the pool.

I thought about passing the mempool back into kmemleak and checking
whether it's one of the two pools it uses but concluded that it's not
worth it.

-- 
Catalin


Re: [PATCH v13 04/20] mm, arm64: untag user pointers passed to memory syscalls

2019-03-29 Thread Catalin Marinas
(I trimmed down the cc list a bit since it's always bouncing)

On Thu, Mar 28, 2019 at 02:19:34PM -0400, Steven Rostedt wrote:
> On Thu, 28 Mar 2019 19:10:07 +0100
> Andrey Konovalov  wrote:
> 
> > > > Signed-off-by: Andrey Konovalov 
> > > > ---
> > > >  ipc/shm.c  | 2 ++
> > > >  mm/madvise.c   | 2 ++
> > > >  mm/mempolicy.c | 5 +
> > > >  mm/migrate.c   | 1 +
> > > >  mm/mincore.c   | 2 ++
> > > >  mm/mlock.c | 5 +
> > > >  mm/mmap.c  | 7 +++
> > > >  mm/mprotect.c  | 1 +
> > > >  mm/mremap.c| 2 ++
> > > >  mm/msync.c | 2 ++
> > > >  10 files changed, 29 insertions(+)  
> > >
> > > I wonder whether it's better to keep these as wrappers in the arm64
> > > code.  
> > 
> > I don't think I understand what you propose, could you elaborate?
> 
> I believe Catalin is saying that instead of placing things like:
> 
> @@ -1593,6 +1593,7 @@ SYSCALL_DEFINE3(shmat, int, shmid, char __user *, 
> shmaddr, int, shmflg)
>   unsigned long ret;
>   long err;
>  
> + shmaddr = untagged_addr(shmaddr);
> 
> To instead have the shmaddr set to the untagged_addr() before calling
> the system call, and passing the untagged addr to the system call, as
> that goes through the arm64 architecture specific code first.

Indeed. For example, we already have a SYSCALL_DEFINE6(mmap, ...) in
arch/arm64/kernel/sys.c, just add the untagging there. We could do
something similar for the other syscalls. I don't mind doing this in the
generic code but if it's only needed for arm64, I'd rather keep the
generic changes to a minimum.

(I had a hack overriding __SC_CAST to do this automatically for pointer
arguments but this wouldn't work on mmap() and friends as the argument
is unsigned long)

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-28 Thread Catalin Marinas
On Thu, Mar 28, 2019 at 01:50:29PM +0200, Pekka Enberg wrote:
> On 27/03/2019 2.59, Qian Cai wrote:
> > > > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > > > metadata into the tracked memory itself in a foreseeable future, this
> > > > provides a good balance between enabling kmemleak in a low-memory
> > > > situation and not introducing too much hackiness into the existing
> > > > code for now.
> 
> On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote:
> > > Unfortunately I am not that brave soul, but I'm wondering what the
> > > complication here is? It shouldn't be too hard to teach calculate_sizes() 
> > > in
> > > SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata.
> 
> On 28/03/2019 12.30, Catalin Marinas wrote:> I don't think it's the
> calculate_sizes() that's the hard part. The way
> > kmemleak is designed assumes that the metadata has a longer lifespan
> > than the slab object it is tracking (and refcounted via
> > get_object/put_object()). We'd have to replace some of the
> > rcu_read_(un)lock() regions with a full kmemleak_lock together with a
> > few more tweaks to allow the release of kmemleak_lock during memory
> > scanning (which can take minutes; so it needs to be safe w.r.t. metadata
> > freeing, currently relying on a deferred RCU freeing).
> 
> Right.
> 
> I think SLUB already supports delaying object freeing because of KASAN (see
> the slab_free_freelist_hook() function) so the issue with metadata outliving
> object is solvable (although will consume more memory).

Thanks. That's definitely an area to investigate.

> I can't say I remember enough details from kmemleak to comment on the
> locking complications you point out, though.

They are not too bad, I'd just need some quiet couple of days to go
through them (which I don't have at the moment).

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-28 Thread Catalin Marinas
On Wed, Mar 27, 2019 at 02:02:27PM -0400, Qian Cai wrote:
> On 3/27/19 1:29 PM, Catalin Marinas wrote:
> > From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001
> > From: Catalin Marinas 
> > Date: Wed, 27 Mar 2019 17:20:57 +
> > Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak
> >  objects
> > 
> > This patch adds an emergency pool for struct kmemleak_object in case the
> > normal kmem_cache_alloc() fails under the gfp constraints passed by the
> > slab allocation caller. The patch also removes __GFP_NOFAIL which does
> > not play well with other gfp flags (introduced by commit d9570ee3bd1d,
> > "kmemleak: allow to coexist with fault injection").
> > 
> > Suggested-by: Michal Hocko 
> > Signed-off-by: Catalin Marinas 
> 
> It takes 2 runs of LTP oom01 tests to disable kmemleak.

What configuration are you using (number of CPUs, RAM)? I tried this on
an arm64 guest under kvm with 4 CPUs and 512MB of RAM, together with
fault injection on kmemleak_object cache and running oom01 several times
without any failures.

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-28 Thread Catalin Marinas
On Wed, Mar 27, 2019 at 11:21:58AM -0700, Matthew Wilcox wrote:
> On Wed, Mar 27, 2019 at 05:29:57PM +0000, Catalin Marinas wrote:
> > On Wed, Mar 27, 2019 at 09:44:32AM +0100, Michal Hocko wrote:
> > > As long as there is an implicit __GFP_NOFAIL then kmemleak is simply
> > > broken no matter what other gfp flags you play with. Has anybody looked
> > > at some sort of preallocation where gfpflags_allow_blocking context
> > > allocate objects into a pool that non-sleeping allocations can eat from?
> > 
> > Quick attempt below and it needs some more testing (pretty random pick
> > of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are
> > the other flags safe or we should trim them further?
> 
> Why not use mempool?

I had the wrong impression that it could sleep but it's only if
__GFP_DIRECT_RECLAIM is passed. See below for an updated patch.

> >  #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
> >  __GFP_NORETRY | __GFP_NOMEMALLOC | \
> > -__GFP_NOWARN | __GFP_NOFAIL)
> > +__GFP_NOWARN)
> 
> Why GFP_NORETRY?  And if I have specified one of the other retry policies
> in my gfp flags, you should presumably clear that off before setting
> GFP_NORETRY.

It only preserves GFP_KERNEL|GFP_ATOMIC from the original flags
while setting the NOWARN|NORETRY|NOMEMALLOC (the same flags seem to be
set by mempool_alloc()). Anyway, with the changes below, I'll let
mempool add the relevant flags while kmemleak only passes
GFP_KERNEL|GFP_ATOMIC from the original caller.

---8<----------
>From 09eba8f0235eb16409931e6aad77a45a12bedc82 Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Thu, 28 Mar 2019 13:26:07 +
Subject: [PATCH] mm: kmemleak: Use mempool allocations for kmemleak objects

This patch adds mempool allocations for struct kmemleak_object and
kmemleak_scan_area as slightly more resilient than kmem_cache_alloc()
under memory pressure. The patch also masks out all the gfp flags passed
to kmemleak other than GFP_KERNEL|GFP_ATOMIC.

Suggested-by: Michal Hocko 
Signed-off-by: Catalin Marinas 
---
 mm/kmemleak.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 6c318f5ac234..9755678e83b9 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -82,6 +82,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,9 +126,7 @@
 #define BYTES_PER_POINTER  sizeof(void *)
 
 /* GFP bitmask for kmemleak internal allocations */
-#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
-__GFP_NORETRY | __GFP_NOMEMALLOC | \
-__GFP_NOWARN | __GFP_NOFAIL)
+#define gfp_kmemleak_mask(gfp) ((gfp) & (GFP_KERNEL | GFP_ATOMIC))
 
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
@@ -191,6 +190,9 @@ struct kmemleak_object {
 #define HEX_ASCII  1
 /* max number of lines to be printed */
 #define HEX_MAX_LINES  2
+/* minimum memory pool sizes */
+#define MIN_OBJECT_POOL(NR_CPUS * 4)
+#define MIN_SCAN_AREA_POOL (NR_CPUS * 1)
 
 /* the list of all allocated objects */
 static LIST_HEAD(object_list);
@@ -203,7 +205,9 @@ static DEFINE_RWLOCK(kmemleak_lock);
 
 /* allocation caches for kmemleak internal data */
 static struct kmem_cache *object_cache;
+static mempool_t *object_mempool;
 static struct kmem_cache *scan_area_cache;
+static mempool_t *scan_area_mempool;
 
 /* set if tracing memory operations is enabled */
 static int kmemleak_enabled;
@@ -483,9 +487,9 @@ static void free_object_rcu(struct rcu_head *rcu)
 */
hlist_for_each_entry_safe(area, tmp, >area_list, node) {
hlist_del(>node);
-   kmem_cache_free(scan_area_cache, area);
+   mempool_free(area, scan_area_mempool);
}
-   kmem_cache_free(object_cache, object);
+   mempool_free(object, object_mempool);
 }
 
 /*
@@ -576,7 +580,7 @@ static struct kmemleak_object *create_object(unsigned long 
ptr, size_t size,
struct rb_node **link, *rb_parent;
unsigned long untagged_ptr;
 
-   object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+   object = mempool_alloc(object_mempool, gfp_kmemleak_mask(gfp));
if (!object) {
pr_warn("Cannot allocate a kmemleak_object structure\n");
kmemleak_disable();
@@ -640,7 +644,7 @@ static struct kmemleak_object *create_object(unsigned long 
ptr, size_t size,
 * be freed while the kmemleak_lock is held.
 */
dump_object_info(parent);
-

Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-28 Thread Catalin Marinas
On Thu, Mar 28, 2019 at 08:05:31AM +0200, Pekka Enberg wrote:
> On 27/03/2019 2.59, Qian Cai wrote:
> > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > metadata into the tracked memory itself in a foreseeable future, this
> > provides a good balance between enabling kmemleak in a low-memory
> > situation and not introducing too much hackiness into the existing
> > code for now.
> 
> Unfortunately I am not that brave soul, but I'm wondering what the
> complication here is? It shouldn't be too hard to teach calculate_sizes() in
> SLUB about a new SLAB_KMEMLEAK flag that reserves spaces for the metadata.

I don't think it's the calculate_sizes() that's the hard part. The way
kmemleak is designed assumes that the metadata has a longer lifespan
than the slab object it is tracking (and refcounted via
get_object/put_object()). We'd have to replace some of the
rcu_read_(un)lock() regions with a full kmemleak_lock together with a
few more tweaks to allow the release of kmemleak_lock during memory
scanning (which can take minutes; so it needs to be safe w.r.t. metadata
freeing, currently relying on a deferred RCU freeing).

Anyway, I think it is possible, just not straight forward.

-- 
Catalin


Re: [PATCH v4] kmemleak: survive in a low-memory situation

2019-03-27 Thread Catalin Marinas
On Wed, Mar 27, 2019 at 09:44:32AM +0100, Michal Hocko wrote:
> On Tue 26-03-19 20:59:48, Qian Cai wrote:
> [...]
> > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > metadata into the tracked memory itself in a foreseeable future,

Revisiting the kmemleak memory scanning code, that's not actually
possible without some long periods with kmemleak_lock held. The scanner
relies on the kmemleak_object (refcounted) being around even when the
actual memory block has been freed.

> > this
> > provides a good balance between enabling kmemleak in a low-memory
> > situation and not introducing too much hackiness into the existing
> > code for now. Another approach is to fail back the original allocation
> > once kmemleak_alloc() failed, but there are too many call sites to
> > deal with which makes it error-prone.
> 
> As long as there is an implicit __GFP_NOFAIL then kmemleak is simply
> broken no matter what other gfp flags you play with. Has anybody looked
> at some sort of preallocation where gfpflags_allow_blocking context
> allocate objects into a pool that non-sleeping allocations can eat from?

Quick attempt below and it needs some more testing (pretty random pick
of the EMERGENCY_POOL_SIZE value). Also, with __GFP_NOFAIL removed, are
the other flags safe or we should trim them further?

---8<---
>From dc4194539f8191bb754901cea74c86e7960886f8 Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Wed, 27 Mar 2019 17:20:57 +
Subject: [PATCH] mm: kmemleak: Add an emergency allocation pool for kmemleak
 objects

This patch adds an emergency pool for struct kmemleak_object in case the
normal kmem_cache_alloc() fails under the gfp constraints passed by the
slab allocation caller. The patch also removes __GFP_NOFAIL which does
not play well with other gfp flags (introduced by commit d9570ee3bd1d,
"kmemleak: allow to coexist with fault injection").

Suggested-by: Michal Hocko 
Signed-off-by: Catalin Marinas 
---
 mm/kmemleak.c | 59 +--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 6c318f5ac234..366a680cff7c 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -127,7 +127,7 @@
 /* GFP bitmask for kmemleak internal allocations */
 #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
 __GFP_NORETRY | __GFP_NOMEMALLOC | \
-__GFP_NOWARN | __GFP_NOFAIL)
+__GFP_NOWARN)
 
 /* scanning area inside a memory block */
 struct kmemleak_scan_area {
@@ -191,11 +191,16 @@ struct kmemleak_object {
 #define HEX_ASCII  1
 /* max number of lines to be printed */
 #define HEX_MAX_LINES  2
+/* minimum emergency pool size */
+#define EMERGENCY_POOL_SIZE(NR_CPUS * 4)
 
 /* the list of all allocated objects */
 static LIST_HEAD(object_list);
 /* the list of gray-colored objects (see color_gray comment below) */
 static LIST_HEAD(gray_list);
+/* emergency pool allocation */
+static LIST_HEAD(emergency_list);
+static int emergency_pool_size;
 /* search tree for object boundaries */
 static struct rb_root object_tree_root = RB_ROOT;
 /* rw_lock protecting the access to object_list and object_tree_root */
@@ -467,6 +472,43 @@ static int get_object(struct kmemleak_object *object)
return atomic_inc_not_zero(>use_count);
 }
 
+/*
+ * Emergency pool allocation and freeing. kmemleak_lock must not be held.
+ */
+static struct kmemleak_object *emergency_alloc(void)
+{
+   unsigned long flags;
+   struct kmemleak_object *object;
+
+   write_lock_irqsave(_lock, flags);
+   object = list_first_entry_or_null(_list, typeof(*object), 
object_list);
+   if (object) {
+   list_del(>object_list);
+   emergency_pool_size--;
+   }
+   write_unlock_irqrestore(_lock, flags);
+
+   return object;
+}
+
+/*
+ * Return true if object added to the emergency pool, false otherwise.
+ */
+static bool emergency_free(struct kmemleak_object *object)
+{
+   unsigned long flags;
+
+   if (emergency_pool_size >= EMERGENCY_POOL_SIZE)
+   return false;
+
+   write_lock_irqsave(_lock, flags);
+   list_add(>object_list, _list);
+   emergency_pool_size++;
+   write_unlock_irqrestore(_lock, flags);
+
+   return true;
+}
+
 /*
  * RCU callback to free a kmemleak_object.
  */
@@ -485,7 +527,8 @@ static void free_object_rcu(struct rcu_head *rcu)
hlist_del(>node);
kmem_cache_free(scan_area_cache, area);
}
-   kmem_cache_free(object_cache, object);
+   if (!emergency_free(object))
+   kmem_cache_free(object_cache, object);
 }
 
 /*
@@ -577,6 +620,8 @@ static struct kmemleak_object *create_object(unsigned long

Re: [PATCH v3] kmemleaak: survive in a low-memory situation

2019-03-26 Thread Catalin Marinas
On Tue, Mar 26, 2019 at 09:05:36AM -0700, Matthew Wilcox wrote:
> On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > metadata into the tracked memory itself in a foreseeable future, this
> > provides a good balance between enabling kmemleak in a low-memory
> > situation and not introducing too much hackiness into the existing
> > code for now.
> 
> I don't understand kmemleak.  Kirill pointed me at this a few days ago:
> 
> https://gist.github.com/kiryl/3225e235fea390aa2e49bf625bbe83ec
> 
> It's caused by the XArray allocating memory using GFP_NOWAIT | __GFP_NOWARN.
> kmemleak then decides it needs to allocate memory to track this memory.
> So it calls kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> 
> #define gfp_kmemleak_mask(gfp)  (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
>  __GFP_NORETRY | __GFP_NOMEMALLOC | \
>  __GFP_NOWARN | __GFP_NOFAIL)
> 
> then the page allocator gets to see GFP_NOFAIL | GFP_NOWAIT and gets angry.
> 
> But I don't understand why kmemleak needs to mess with the GFP flags at
> all.

Originally, it was just preserving GFP_KERNEL | GFP_ATOMIC. Starting
with commit 6ae4bd1f0bc4 ("kmemleak: Allow kmemleak metadata allocations
to fail"), this mask changed, aimed at making kmemleak allocation
failures less verbose (i.e. just disable it since it's a debug tool).

Commit d9570ee3bd1d ("kmemleak: allow to coexist with fault injection")
introduced __GFP_NOFAIL but this came with its own problems which have
been previously reported (the warning you mentioned is another one of
these). We didn't get to any clear conclusion on how best to allow
allocations to fail with fault injection but not for the kmemleak
metadata. Your suggestion below would probably do the trick.

> Just allocate using the same flags as the caller, and fail the original
> allocation if the kmemleak allocation fails.  Like this:
> 
> +++ b/mm/slab.h
> @@ -435,12 +435,22 @@ static inline void slab_post_alloc_hook(struct 
> kmem_cache *s, gfp_t flags,
> for (i = 0; i < size; i++) {
> p[i] = kasan_slab_alloc(s, p[i], flags);
> /* As p[i] might get tagged, call kmemleak hook after KASAN. 
> */
> -   kmemleak_alloc_recursive(p[i], s->object_size, 1,
> -s->flags, flags);
> +   if (kmemleak_alloc_recursive(p[i], s->object_size, 1,
> +s->flags, flags))
> +   goto fail;
> }
>  
> if (memcg_kmem_enabled())
> memcg_kmem_put_cache(s);
> +   return;
> +
> +fail:
> +   while (i > 0) {
> +   kasan_blah(...);
> +   kmemleak_blah();
> +   i--;
> +   }
> + free_blah(p);
> +   *p = NULL;
>  }
>  
>  #ifndef CONFIG_SLOB
> 
> 
> and if we had something like this, we wouldn't need kmemleak to have this
> self-disabling or must-succeed property.

We'd still need the self-disabling in place since there are a few other
places where we call kmemleak_alloc() from.

-- 
Catalin


Re: [PATCH v3] kmemleaak: survive in a low-memory situation

2019-03-26 Thread Catalin Marinas
On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> Kmemleak could quickly fail to allocate an object structure and then
> disable itself in a low-memory situation. For example, running a mmap()
> workload triggering swapping and OOM. This is especially problematic for
> running things like LTP testsuite where one OOM test case would disable
> the whole kmemleak and render the rest of test cases without kmemleak
> watching for leaking.
> 
> Kmemleak allocation could fail even though the tracked memory is
> succeeded. Hence, it could still try to start a direct reclaim if it is
> not executed in an atomic context (spinlock, irq-handler etc), or a
> high-priority allocation in an atomic context as a last-ditch effort.
> Since kmemleak is a debug feature, it is unlikely to be used in
> production that memory resources is scarce where direct reclaim or
> high-priority atomic allocations should not be granted lightly.
> 
> Unless there is a brave soul to reimplement the kmemleak to embed it's
> metadata into the tracked memory itself in a foreseeable future, this
> provides a good balance between enabling kmemleak in a low-memory
> situation and not introducing too much hackiness into the existing
> code for now.

Embedding the metadata would help with the slab allocations (though not
with vmalloc) but it comes with its own potential issues. There are some
bits of kmemleak that rely on deferred freeing of metadata for RCU
traversal, so this wouldn't go well with embedding it.

I wonder whether we'd be better off to replace the metadata allocator
with gen_pool. This way we'd also get rid of early logging/replaying of
the memory allocations since we can populate the gen_pool early with a
static buffer.

-- 
Catalin


Re: [PATCH v6 00/10] arm64: add system vulnerability sysfs entries

2019-03-25 Thread Catalin Marinas
On Thu, Mar 21, 2019 at 06:05:47PM -0500, Jeremy Linton wrote:
> Arm64 machines should be displaying a human readable
> vulnerability status to speculative execution attacks in
> /sys/devices/system/cpu/vulnerabilities 
> 
> This series enables that behavior by providing the expected
> functions. Those functions expose the cpu errata and feature
> states, as well as whether firmware is responding appropriately
> to display the overall machine status. This means that in a
> heterogeneous machine we will only claim the machine is mitigated
> or safe if we are confident all booted cores are safe or
> mitigated.
> 
> v5->v6:
>   Invert meltdown logic to display that a core is safe rather
>  than mitigated if the mitigation has been enabled on
>  machines that are safe. This can happen when the
>  mitigation was forced on via command line or KASLR.
>  This means that in order to detect if kpti is enabled
>  other methods must be used (look at dmesg) when the
>  machine isn't itself susceptible to meltdown.
>   Trivial whitespace tweaks.

The v6 logic looks fine to me. For the whole series:

Reviewed-by: Catalin Marinas 


[GIT PULL] arm64 updates for 5.1-rc2

2019-03-21 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 changes below. It's mostly fixes apart from the
kprobe blacklist checking which was deferred because of conflicting with
a fix merged after I pinned the arm64 for-next/core branch (f2b3d8566d81
"arm64: kprobe: Always blacklist the KVM world-switch code").

Thanks.

The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to e5a5af7718610c819c4d368bb62655ee43a38011:

  arm64: remove obsolete selection of MULTI_IRQ_HANDLER (2019-03-20 17:34:16 
+)


- Update the kprobe blacklist checking for arm64. This was supposed to
  be queued during the merging window but, due to conflicts, it was
  deferred post -rc1

- Extend the Fujitsu erratum 010001 workaround to A64FX v1r0

- Whitelist HiSilicon Taishan v110 CPUs as not susceptible to Meltdown

- Export save_stack_trace_regs()

- Remove obsolete selection of MULTI_IRQ_HANDLER


Hanjun Guo (2):
  arm64: Add MIDR encoding for HiSilicon Taishan CPUs
  arm64: kpti: Whitelist HiSilicon Taishan v110 CPUs

Mark Rutland (1):
  arm64: apply workaround on A64FX v1r0

Masami Hiramatsu (4):
  arm64: kprobes: Move extable address check into arch_prepare_kprobe()
  arm64: kprobes: Remove unneeded RODATA check
  arm64: kprobes: Move exception_text check in blacklist
  arm64: kprobes: Use arch_populate_kprobe_blacklist()

Matthias Kaehlcke (1):
  arm64: remove obsolete selection of MULTI_IRQ_HANDLER

William Cohen (1):
  arm64/stacktrace: Export save_stack_trace_regs()

 arch/arm64/Kconfig |  1 -
 arch/arm64/include/asm/cputype.h   |  6 +++-
 arch/arm64/kernel/cpufeature.c |  1 +
 arch/arm64/kernel/probes/kprobes.c | 56 +-
 arch/arm64/kernel/stacktrace.c |  1 +
 5 files changed, 38 insertions(+), 27 deletions(-)

-- 
Catalin


Re: [PATCH] arm64: remove obsolete selection of MULTI_IRQ_HANDLER

2019-03-20 Thread Catalin Marinas
On Wed, Mar 20, 2019 at 10:20:56AM -0700, Matthias Kaehlcke wrote:
> The arm64 config selects MULTI_IRQ_HANDLER, which was renamed to
> GENERIC_IRQ_MULTI_HANDLER by commit 4c301f9b6a94 ("ARM: Convert
> to GENERIC_IRQ_MULTI_HANDLER"). The 'new' option is already
> selected, so just remove the obsolete entry.
> 
> Signed-off-by: Matthias Kaehlcke 

Queued for 5.1-rc2. Thanks.

-- 
Catalin


Re: [PATCH v2 1/1] initrd: move initrd_start calculate within linear mapping range check

2019-03-19 Thread Catalin Marinas
On Thu, Mar 14, 2019 at 11:20:47AM +0800, pierre Kuo wrote:
> in the previous case, initrd_start and initrd_end can be successfully
> returned either (base < memblock_start_of_DRAM()) or (base + size >
> memblock_start_of_DRAM() + linear_region_size).
> 
> That means even linear mapping range check fail for initrd_start and
> initrd_end, it still can get virtual address. Here we put
> initrd_start/initrd_end to be calculated only when linear mapping check
> pass.
> 
> Fixes: c756c592e442 ("arm64: Utilize phys_initrd_start/phys_initrd_size")

For future versions, please also cc the author of the original commit
you are fixing.

> Reviewed-by: Steven Price 
> Signed-off-by: pierre Kuo 
> ---
> Changes in v2:
> - add Fixes tag
> 
>  arch/arm64/mm/init.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 7205a9085b4d..1adf418de685 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -425,6 +425,9 @@ void __init arm64_memblock_init(void)
>   memblock_remove(base, size); /* clear MEMBLOCK_ flags */
>   memblock_add(base, size);
>   memblock_reserve(base, size);
> + /* the generic initrd code expects virtual addresses */
> + initrd_start = __phys_to_virt(phys_initrd_start);
> + initrd_end = initrd_start + phys_initrd_size;
>   }
>   }
>  
> @@ -450,11 +453,6 @@ void __init arm64_memblock_init(void)
>* pagetables with memblock.
>*/
>   memblock_reserve(__pa_symbol(_text), _end - _text);
> - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && phys_initrd_size) {
> - /* the generic initrd code expects virtual addresses */
> - initrd_start = __phys_to_virt(phys_initrd_start);
> - initrd_end = initrd_start + phys_initrd_size;
> - }

With CONFIG_RANDOMIZE_BASE we can get a further change to memstart_addr
after the place where you moved the initrd_{start,end} setting, which
would result in a different value for __phys_to_virt(phys_initrd_start).

-- 
Catalin


Re: linux-next: Signed-off-by missing for commit in the arm64-fixes tree

2019-03-19 Thread Catalin Marinas
Hi Stephen,

On Wed, Mar 20, 2019 at 01:43:12AM +1100, Stephen Rothwell wrote:
> Commit
> 
>   a4d17a19c493 ("arm64: apply workaround on A64FX v1r0")
> 
> is missing a Signed-off-by from its author.

Thanks for this. The patch is indeed confusing, sent by Takayuki with a
signed-off-by from Zhang and re-written by Mark while keeping the
original text.

I'm going to make Mark the author while keeping Takayuki as reported-by.

-- 
Catalin


Re: [PATCH] arm64/stacktrace: Export save_stack_trace_regs()

2019-03-19 Thread Catalin Marinas
On Fri, Mar 01, 2019 at 03:00:41PM -0500, William Cohen wrote:
> The ARM64 implements the save_stack_trace_regs function, but it is
> unusable for any diagnostic tooling compiled as a kernel module due
> the missing EXPORT_SYMBOL_GPL for the function.  Export
> save_stack_trace_regs() to align with other architectures such as
> s390, openrisc, and powerpc. This is similar to the ARM64 export of
> save_stack_trace_tsk() added in git commit e27c7fa015d6.
> 
> Signed-off-by: William Cohen 

Queued for 5.1-rc2. Thanks.

-- 
Catalin


Re: [RESEND PATCH] Make Fujitsu Erratum 010001 patch can be applied on A64FX v1r0

2019-03-19 Thread Catalin Marinas
On Mon, Mar 18, 2019 at 12:06:06PM +, Mark Rutland wrote:
> From 6439e9c0b1525e9d4c7be65552e6f2b1f9d1dbe0 Mon Sep 17 00:00:00 2001
> From: "Okamoto, Takayuki" 
> Date: Fri, 15 Mar 2019 12:22:36 +
> Subject: [PATCH] arm64: apply workaround on A64FX v1r0
> 
> Fujitsu erratum 010001 applies to A64FX v0r0 and v1r0, and we try to
> handle either by masking MIDR with MIDR_FUJITSU_ERRATUM_010001_MASK
> before comparing it to MIDR_FUJITSU_ERRATUM_010001.
> 
> Unfortunately, MIDR_FUJITSU_ERRATUM_010001 is constructed incorrectly
> using MIDR_VARIANT(), which is intended to extract the variant field
> from MIDR_EL1, rather than generate the field in-place. This results in
> MIDR_FUJITSU_ERRATUM_010001 being all-ones, and we only match A64FX
> v0r0.
> 
> This patch uses MIDR_CPU_VAR_REV() to generate an in-place mask for the
> variant field, ensuring the we match both v0r0 and v1r0.
> 
> Fixes: 3e32131abc311a5c ("arm64: Add workaround for Fujitsu A64FX erratum 
> 010001")
> Signed-off-by: Zhang Lei 
> [Mark: use MIDR_CPU_VAR_REV(), reword commit message]
> Signed-off-by: Mark Rutland 

Thanks. Mark's variant queued for 5.1-rc2.

-- 
Catalin


[PATCH] NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()

2019-03-18 Thread Catalin Marinas
Commit 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.")
changed the lock copying from memcpy() to the dedicated
locks_copy_lock() function. The latter correctly increments the
nfs4_lock_state.ls_count via nfs4_fl_copy_lock(), however, this refcount
has already been incremented in the nfs4_alloc_{lock,unlock}data().
Kmemleak subsequently reports an unreferenced nfs4_lock_state object as
below (arm64 platform):

unreferenced object 0x8000fce0b000 (size 256):
  comm "systemd-sysuser", pid 1608, jiffies 4294892825 (age 32.348s)
  hex dump (first 32 bytes):
20 57 4c fb 00 80 ff ff 20 57 4c fb 00 80 ff ff   WL. WL.
00 57 4c fb 00 80 ff ff 01 00 00 00 00 00 00 00  .WL.
  backtrace:
[<0d15010d>] kmem_cache_alloc+0x178/0x208
[<d7c1d264>] nfs4_set_lock_state+0x124/0x1f0
[<9c867628>] nfs4_proc_lock+0x90/0x478
[<1686bd74>] do_setlk+0x64/0xe8
[<e01500d4>] nfs_lock+0xe8/0x1f0
[<4f387d8d>] vfs_lock_file+0x18/0x40
[<656ab79b>] do_lock_file_wait+0x68/0xf8
[<f17c4a4b>] fcntl_setlk+0x224/0x280
[<52a242c6>] do_fcntl+0x418/0x730
[<4f47291a>] __arm64_sys_fcntl+0x84/0xd0
[<d6856e01>] el0_svc_common+0x80/0xf0
[<9c4bd1df>] el0_svc_handler+0x2c/0x80
[<b1a0d479>] el0_svc+0x8/0xc
[<56c62a0f>] 0x

This patch removes the original refcount_inc(>ls_count) that was
paired with the memcpy() lock copying.

Fixes: 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.")
Cc:  # 5.0.x-
Cc: NeilBrown 
Signed-off-by: Catalin Marinas 
---
 fs/nfs/nfs4proc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4dbb0ee23432..6d2812a39287 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6301,7 +6301,6 @@ static struct nfs4_unlockdata 
*nfs4_alloc_unlockdata(struct file_lock *fl,
p->arg.seqid = seqid;
p->res.seqid = seqid;
p->lsp = lsp;
-   refcount_inc(>ls_count);
/* Ensure we don't close file until we're done freeing locks! */
p->ctx = get_nfs_open_context(ctx);
p->l_ctx = nfs_get_lock_context(ctx);
@@ -6526,7 +6525,6 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct 
file_lock *fl,
p->res.lock_seqid = p->arg.lock_seqid;
p->lsp = lsp;
p->server = server;
-   refcount_inc(>ls_count);
p->ctx = get_nfs_open_context(ctx);
locks_init_lock(>fl);
locks_copy_lock(>fl, fl);


Re: [GIT PULL] arm64 updates for 5.1-rc1

2019-03-10 Thread Catalin Marinas
Hi Linus,

On Fri, Mar 08, 2019 at 04:42:23PM +, Catalin Marinas wrote:
> The following changes since commit 6e4933a006616343f66c4702dc4fc56bb25e7b02:
> 
>   irqdesc: Add domain handler for NMIs (2019-02-05 14:37:05 +)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux 
> tags/arm64-upstream
> 
> for you to fetch changes up to b855b58ac1b7891b219e1d9ef60c45c774cadefe:
> 
>   arm64: mmu: drop paging_init comments (2019-03-01 16:40:07 +)

Were there any issues with this pull request or you haven't got around
to merging it yet?

I forgot to mention but in case it wasn't clear, my branch is based on
5.0-rc3. The irqdesc commit together with 3 more went in via tglx's
irq/core tree (commit a324ca9cad47 "Merge tag 'irqchip-5.1' of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into
irq/core"), hence the "git request-pull" common ancestor above.

Thanks.

-- 
Catalin


[GIT PULL] arm64 updates for 5.1-rc1

2019-03-08 Thread Catalin Marinas
Hi Linus,

Please pull the arm64 updates for 5.1 below. There is a minor conflict
with mainline in arch/arm64/Kconfig.platforms (with the changes from the
arm-soc tree); you can find my merge resolution at the end of this
email. The pull request touches riscv for the inX() ordering w.r.t.
delay() and the acks are in place.

Thanks,
Catalin

The following changes since commit 6e4933a006616343f66c4702dc4fc56bb25e7b02:

  irqdesc: Add domain handler for NMIs (2019-02-05 14:37:05 +)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-upstream

for you to fetch changes up to b855b58ac1b7891b219e1d9ef60c45c774cadefe:

  arm64: mmu: drop paging_init comments (2019-03-01 16:40:07 +)


arm64 updates for 5.1:

- Pseudo NMI support for arm64 using GICv3 interrupt priorities

- uaccess macros clean-up (unsafe user accessors also merged but
  reverted, waiting for objtool support on arm64)

- ptrace regsets for Pointer Authentication (ARMv8.3) key management

- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
  riscv maintainers)

- arm64/perf updates: PMU bindings converted to json-schema, unused
  variable and misleading comment removed

- arm64/debug fixes to ensure checking of the triggering exception level
  and to avoid the propagation of the UNKNOWN FAR value into the si_code
  for debug signals

- Workaround for Fujitsu A64FX erratum 010001

- lib/raid6 ARM NEON optimisations

- NR_CPUS now defaults to 256 on arm64

- Minor clean-ups (documentation/comments, Kconfig warning, unused
  asm-offsets, clang warnings)

- MAINTAINERS update for list information to the ARM64 ACPI entry


Anders Roxell (1):
  arm64: Kconfig.platforms: fix warning unmet direct dependencies

Andrew Murray (2):
  arm64: perf: remove misleading comment
  arm64: asm-offsets: remove unused offsets

Ard Biesheuvel (1):
  lib/raid6: arm: optimize away a mask operation in NEON recovery routine

Arnd Bergmann (1):
  arm64: avoid clang warning about self-assignment

Catalin Marinas (3):
  Merge branch 'irq/generic-nmi' of 
git://git.kernel.org/.../maz/arm-platforms
  Merge branch 'for-next/perf' of git://git.kernel.org/.../will/linux
  Revert "arm64: uaccess: Implement unsafe accessors"

Daniel Thompson (1):
  arm64: alternative: Apply alternatives early in boot process

Greg Kroah-Hartman (1):
  arm64: dump: no need to check return value of debugfs_create functions

Julien Grall (1):
  arm64: Remove documentation about TIF_USEDFPU

Julien Thierry (28):
  arm64: uaccess: Cleanup get/put_user()
  arm64: uaccess: Implement unsafe accessors
  arm64: Fix HCR.TGE status for NMI contexts
  arm64: Remove unused daif related functions/macros
  arm64: cpufeature: Set SYSREG_GIC_CPUIF as a boot system feature
  arm64: cpufeature: Add cpufeature for IRQ priority masking
  arm/arm64: gic-v3: Add PMR and RPR accessors
  irqchip/gic-v3: Switch to PMR masking before calling IRQ handler
  arm64: ptrace: Provide definitions for PMR values
  arm64: Make PMR part of task context
  arm64: Unmask PMR before going idle
  arm64: kvm: Unmask PMR before entering guest
  efi: Let architectures decide the flags that should be saved/restored
  arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking
  arm64: daifflags: Include PMR in daifflags restore operations
  arm64: alternative: Allow alternative status checking per cpufeature
  irqchip/gic-v3: Factor group0 detection into functions
  arm64: Switch to PMR masking when starting CPUs
  arm64: gic-v3: Implement arch support for priority masking
  irqchip/gic-v3: Detect if GIC can support pseudo-NMIs
  irqchip/gic-v3: Handle pseudo-NMIs
  irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
  arm64: Handle serror in NMI context
  arm64: Skip preemption when exiting an NMI
  arm64: Skip irqflags tracing for NMI in IRQs disabled context
  arm64: Enable the support of pseudo-NMIs
  arm64: irqflags: Fix clang build warnings
  arm64: Rename get_thread_info()

Kristina Martsenko (1):
  arm64: add ptrace regsets for ptrauth key management

Logan Gunthorpe (1):
  arm64: mm: make use of new memblocks_present() helper

Lorenzo Pieralisi (1):
  MAINTAINERS: Add LAKML list to ACPI for ARM64 entry

Mark Rutland (1):
  arm64: default NR_CPUS to 256

Nathan Chancellor (1):
  efi/arm: Don't expect a return value of ptdump_debugfs_register

Peng Fan (1):
  arm64: mmu: drop paging_init comments

Rob Herring (1):
  dt-bindings: arm: Convert PMU binding to json-schema

Valentin Schneider (1):
  arm64: entry: Remove unneeded need_resched() loop

Will Deacon (6):
  arm64: Remove asm/memblock.h
  asm-generic/io: Pass res

Re: [PATCH v5 03/10] arm64: add sysfs vulnerability show for meltdown

2019-03-01 Thread Catalin Marinas
On Fri, Mar 01, 2019 at 10:53:50AM -0600, Jeremy Linton wrote:
> On 3/1/19 10:20 AM, Catalin Marinas wrote:
> > On Fri, Mar 01, 2019 at 10:12:09AM -0600, Jeremy Linton wrote:
> > > On 3/1/19 1:11 AM, Andre Przywara wrote:
> > > > On 2/26/19 7:05 PM, Jeremy Linton wrote:
> > > > > +ssize_t cpu_show_meltdown(struct device *dev, struct
> > > > > device_attribute *attr,
> > > > > +    char *buf)
> > > > > +{
> > > > > +    if (arm64_kernel_unmapped_at_el0())
> > > > > +    return sprintf(buf, "Mitigation: KPTI\n");
> > > > > +
> > > > > +    if (__meltdown_safe)
> > > > > +    return sprintf(buf, "Not affected\n");
> > > > 
> > > > Shall those two checks be swapped? So it doesn't report about a KPTI
> > > > mitigation if the CPU is safe, but we enable KPTI because of KASLR
> > > > having enabled it? Or is that a different knob?
> > > 
> > > Hmmm, I think having it this way reflects the fact that the machine is
> > > mitigated independent of whether it needed it. The force on case is 
> > > similar.
> > > The machine may not have needed the mitigation but it was forced on.
> > 
> > So is this patchset about showing vulnerabilities _and_ mitigations or
> > just one of them?
> 
> Well, I don't think there is a way to express a mitigated but not vulnerable
> state in the current ABI. This set is mostly just to bring us in line with
> the current ABI expectations.

Looking at the ABI doc, it states:

"Not affected"CPU is not affected by the vulnerability
"Vulnerable"  CPU is affected and no mitigation in effect
"Mitigation: $M"  CPU is affected and mitigation $M is in effect

So, yes, we don't have mitigated but not vulnerable. Therefore I think
we should stick to "not affected" and swap the lines above as per
Andre's comment. This file is about Meltdown vulnerability and
mitigation, not KASLR hardening.

-- 
Catalin


Re: [PATCH v10 07/12] fs, arm64: untag user pointers in fs/userfaultfd.c

2019-03-01 Thread Catalin Marinas
On Tue, Feb 26, 2019 at 03:39:08PM +0100, Andrey Konovalov wrote:
> On Sat, Feb 23, 2019 at 12:06 AM Dave Hansen  wrote:
> >
> > On 2/22/19 4:53 AM, Andrey Konovalov wrote:
> > > userfaultfd_register() and userfaultfd_unregister() use provided user
> > > pointers for vma lookups, which can only by done with untagged pointers.
> >
> > So, we have to patch all these sites before the tagged values get to the
> > point of hitting the vma lookup functions.  Dumb question: Why don't we
> > just patch the vma lookup functions themselves instead of all of these
> > callers?
> 
> That might be a working approach as well. We'll still need to fix up
> places where the vma fields are accessed directly. Catalin, what do
> you think?

Most callers of find_vma*() always follow it by a check of
vma->vma_start against some tagged address ('end' in the
userfaultfd_(un)register()) case. So it's not sufficient to untag it in
find_vma().

-- 
Catalin


Re: [PATCH v5 03/10] arm64: add sysfs vulnerability show for meltdown

2019-03-01 Thread Catalin Marinas
On Fri, Mar 01, 2019 at 10:12:09AM -0600, Jeremy Linton wrote:
> On 3/1/19 1:11 AM, Andre Przywara wrote:
> > On 2/26/19 7:05 PM, Jeremy Linton wrote:
> > > Display the mitigation status if active, otherwise
> > > assume the cpu is safe unless it doesn't have CSV3
> > > and isn't in our whitelist.
> > > 
> > > Signed-off-by: Jeremy Linton 
> > > ---
> > >   arch/arm64/kernel/cpufeature.c | 47 ++
> > >   1 file changed, 37 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/cpufeature.c
> > > b/arch/arm64/kernel/cpufeature.c
> > > index f6d84e2c92fe..d31bd770acba 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -944,7 +944,7 @@ has_useable_cnp(const struct
> > > arm64_cpu_capabilities *entry, int scope)
> > >   return has_cpuid_feature(entry, scope);
> > >   }
> > > -#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> > > +static bool __meltdown_safe = true;
> > >   static int __kpti_forced; /* 0: not forced, >0: forced on, <0:
> > > forced off */
> > >   static bool unmap_kernel_at_el0(const struct
> > > arm64_cpu_capabilities *entry,
> > > @@ -963,6 +963,16 @@ static bool unmap_kernel_at_el0(const struct
> > > arm64_cpu_capabilities *entry,
> > >   { /* sentinel */ }
> > >   };
> > >   char const *str = "command line option";
> > > +    bool meltdown_safe;
> > > +
> > > +    meltdown_safe = is_midr_in_range_list(read_cpuid_id(),
> > > kpti_safe_list);
> > > +
> > > +    /* Defer to CPU feature registers */
> > > +    if (has_cpuid_feature(entry, scope))
> > > +    meltdown_safe = true;
> > > +
> > > +    if (!meltdown_safe)
> > > +    __meltdown_safe = false;
> > >   /*
> > >    * For reasons that aren't entirely clear, enabling KPTI on Cavium
> > > @@ -974,6 +984,11 @@ static bool unmap_kernel_at_el0(const struct
> > > arm64_cpu_capabilities *entry,
> > >   __kpti_forced = -1;
> > >   }
> > > +    if (!IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) {
> > > +    pr_info_once("kernel page table isolation disabled by
> > > CONFIG\n");
> > > +    return false;
> > > +    }
> > > +
> > >   /* Forced? */
> > >   if (__kpti_forced) {
> > >   pr_info_once("kernel page table isolation forced %s by %s\n",
> > > @@ -985,14 +1000,10 @@ static bool unmap_kernel_at_el0(const struct
> > > arm64_cpu_capabilities *entry,
> > >   if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > >   return kaslr_offset() > 0;
> > > -    /* Don't force KPTI for CPUs that are not vulnerable */
> > > -    if (is_midr_in_range_list(read_cpuid_id(), kpti_safe_list))
> > > -    return false;
> > > -
> > > -    /* Defer to CPU feature registers */
> > > -    return !has_cpuid_feature(entry, scope);
> > > +    return !meltdown_safe;
> > >   }
> > > +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> > >   static void
> > >   kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
> > >   {
> > > @@ -1022,6 +1033,13 @@ kpti_install_ng_mappings(const struct
> > > arm64_cpu_capabilities *__unused)
> > >   return;
> > >   }
> > > +#else
> > > +static void
> > > +kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
> > > +{
> > > +}
> > > +#endif    /* CONFIG_UNMAP_KERNEL_AT_EL0 */
> > > +
> > >   static int __init parse_kpti(char *str)
> > >   {
> > > @@ -1035,7 +1053,6 @@ static int __init parse_kpti(char *str)
> > >   return 0;
> > >   }
> > >   early_param("kpti", parse_kpti);
> > > -#endif    /* CONFIG_UNMAP_KERNEL_AT_EL0 */
> > >   #ifdef CONFIG_ARM64_HW_AFDBM
> > >   static inline void __cpu_enable_hw_dbm(void)
> > > @@ -1286,7 +1303,6 @@ static const struct arm64_cpu_capabilities
> > > arm64_features[] = {
> > >   .field_pos = ID_AA64PFR0_EL0_SHIFT,
> > >   .min_field_value = ID_AA64PFR0_EL0_32BIT_64BIT,
> > >   },
> > > -#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> > >   {
> > >   .desc = "Kernel page table isolation (KPTI)",
> > >   .capability = ARM64_UNMAP_KERNEL_AT_EL0,
> > > @@ -1302,7 +1318,6 @@ static const struct arm64_cpu_capabilities
> > > arm64_features[] = {
> > >   .matches = unmap_kernel_at_el0,
> > >   .cpu_enable = kpti_install_ng_mappings,
> > >   },
> > > -#endif
> > >   {
> > >   /* FP/SIMD is not implemented */
> > >   .capability = ARM64_HAS_NO_FPSIMD,
> > > @@ -2063,3 +2078,15 @@ static int __init enable_mrs_emulation(void)
> > >   }
> > >   core_initcall(enable_mrs_emulation);
> > > +
> > > +ssize_t cpu_show_meltdown(struct device *dev, struct
> > > device_attribute *attr,
> > > +    char *buf)
> > > +{
> > > +    if (arm64_kernel_unmapped_at_el0())
> > > +    return sprintf(buf, "Mitigation: KPTI\n");
> > > +
> > > +    if (__meltdown_safe)
> > > +    return sprintf(buf, "Not affected\n");
> > 
> > Shall those two checks be swapped? So it doesn't report about a KPTI
> > mitigation if the CPU is safe, but we enable KPTI because of KASLR
> > having 

Re: linux-next: manual merge of the arm-soc tree with the arm64 tree

2019-03-01 Thread Catalin Marinas
On Fri, Mar 01, 2019 at 09:10:36AM +1100, Stephen Rothwell wrote:
> Today's linux-next merge of the arm-soc tree got a conflict in:
> 
>   arch/arm64/Kconfig.platforms
> 
> between commit:
> 
>   a29c78234942 ("arm64: Kconfig.platforms: fix warning unmet direct 
> dependencies")
> 
> from the arm64 tree and commits:
> 
>   67b9282387c5 ("arm64: imx8mq: select GPCv2 irqchip driver")
>   84a2ab25b12d ("arm64: imx8mq: select PM support")
> 
> from the arm-soc tree.
[...]
> diff --cc arch/arm64/Kconfig.platforms
> index d4faca775d9c,c5f6a57f16b8..
> --- a/arch/arm64/Kconfig.platforms
> +++ b/arch/arm64/Kconfig.platforms
> @@@ -145,7 -151,11 +151,11 @@@ config ARCH_MVEB
>   config ARCH_MXC
>   bool "ARMv8 based NXP i.MX SoC family"
>   select ARM64_ERRATUM_843419
>  -select ARM64_ERRATUM_845719
>  +select ARM64_ERRATUM_845719 if COMPAT
> + select IMX_GPCV2
> + select IMX_GPCV2_PM_DOMAINS
> + select PM
> + select PM_GENERIC_DOMAINS
>   help
> This enables support for the ARMv8 based SoCs in the
> NXP i.MX family.

The resolution looks fine, thanks Stephen.

Olof, Arnd, I can drop this commit if you'd rather merge it via the
arm-soc tree (it was related to a CPU erratum and thought I'd pick it
up).

-- 
Catalin


<    4   5   6   7   8   9   10   11   12   13   >