Re: [PATCH -next v4 4/7] arm64: add copy_{to, from}_user to machine check safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:15AM +, Tong Tiangen wrote:
> Add copy_{to, from}_user() to machine check safe.
> 
> If copy fail due to hardware memory error, only the relevant processes are
> affected, so killing the user process and isolate the user page with
> hardware memory errors is a more reasonable choice than kernel panic.
> 
> Add new extable type EX_TYPE_UACCESS_MC which can be used for uaccess that
> can be recovered from hardware memory errors.

I don't understand why we need this.

If we apply EX_TYPE_UACCESS consistently to *all* user accesses, and
*only* to user accesses, that would *always* indicate that we can
recover, and that seems much simpler to deal with.

Today we use EX_TYPE_UACCESS_ERR_ZERO for kernel accesses in a couple of
cases, which we should clean up, and we user EX_TYPE_FIXUP for a couple
of user accesses, but those could easily be converted over.

> The x16 register is used to save the fixup type in copy_xxx_user which
> used extable type EX_TYPE_UACCESS_MC.

Why x16?

How is this intended to be consumed, and why is that behaviour different
from any *other* fault?

Mark.

> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/include/asm/asm-extable.h | 14 ++
>  arch/arm64/include/asm/asm-uaccess.h | 15 ++-
>  arch/arm64/lib/copy_from_user.S  | 18 +++---
>  arch/arm64/lib/copy_to_user.S| 18 +++---
>  arch/arm64/mm/extable.c  | 18 ++
>  5 files changed, 60 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/asm-extable.h 
> b/arch/arm64/include/asm/asm-extable.h
> index c39f2437e08e..75b2c00e9523 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -2,12 +2,18 @@
>  #ifndef __ASM_ASM_EXTABLE_H
>  #define __ASM_ASM_EXTABLE_H
>  
> +#define FIXUP_TYPE_NORMAL0
> +#define FIXUP_TYPE_MC1
> +
>  #define EX_TYPE_NONE 0
>  #define EX_TYPE_FIXUP1
>  #define EX_TYPE_BPF  2
>  #define EX_TYPE_UACCESS_ERR_ZERO 3
>  #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD   4
>  
> +/* _MC indicates that can fixup from machine check errors */
> +#define EX_TYPE_UACCESS_MC   5
> +
>  #ifdef __ASSEMBLY__
>  
>  #define __ASM_EXTABLE_RAW(insn, fixup, type, data)   \
> @@ -27,6 +33,14 @@
>   __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
>   .endm
>  
> +/*
> + * Create an exception table entry for `insn`, which will branch to `fixup`
> + * when an unhandled fault(include sea fault) is taken.
> + */
> + .macro  _asm_extable_uaccess_mc, insn, fixup
> + __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_MC, 0)
> + .endm
> +
>  /*
>   * Create an exception table entry for `insn` if `fixup` is provided. 
> Otherwise
>   * do nothing.
> diff --git a/arch/arm64/include/asm/asm-uaccess.h 
> b/arch/arm64/include/asm/asm-uaccess.h
> index 0557af834e03..6c23c138e1fc 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -63,6 +63,11 @@ alternative_else_nop_endif
>  :x;  \
>   _asm_extableb, l
>  
> +
> +#define USER_MC(l, x...) \
> +:x;  \
> + _asm_extable_uaccess_mc b, l
> +
>  /*
>   * Generate the assembly for LDTR/STTR with exception table entries.
>   * This is complicated as there is no post-increment or pair versions of the
> @@ -73,8 +78,8 @@ alternative_else_nop_endif
>  8889:ldtr\reg2, [\addr, #8];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> - _asm_extable8889b,\l;
> + _asm_extable_uaccess_mc b, \l;
> + _asm_extable_uaccess_mc 8889b, \l;
>   .endm
>  
>   .macro user_stp l, reg1, reg2, addr, post_inc
> @@ -82,14 +87,14 @@ alternative_else_nop_endif
>  8889:sttr\reg2, [\addr, #8];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> - _asm_extable8889b,\l;
> + _asm_extable_uaccess_mc b,\l;
> + _asm_extable_uaccess_mc 8889b,\l;
>   .endm
>  
>   .macro user_ldst l, inst, reg, addr, post_inc
>  :\inst   \reg, [\addr];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> + _asm_extable_uaccess_mc b, \l;
>   .endm
>  #endif
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..480cc5ac0a8d 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -25,7 +25,7 @@
>   .endm
>  
>   .macro strb1 reg, ptr, val
> - strb \reg, [\ptr], \val
> + USER_MC(9998f, strb \reg, [\ptr], \val)
>   .endm
>  
>   .macro 

Re: [PATCH -next v4 4/7] arm64: add copy_{to, from}_user to machine check safe

2022-05-05 Thread Catalin Marinas
On Thu, May 05, 2022 at 02:39:43PM +0800, Tong Tiangen wrote:
> 在 2022/5/4 18:26, Catalin Marinas 写道:
> > On Wed, Apr 20, 2022 at 03:04:15AM +, Tong Tiangen wrote:
> > > Add copy_{to, from}_user() to machine check safe.
> > > 
> > > If copy fail due to hardware memory error, only the relevant processes are
> > > affected, so killing the user process and isolate the user page with
> > > hardware memory errors is a more reasonable choice than kernel panic.
> > 
> > Just to make sure I understand - we can only recover if the fault is in
> > a user page. That is, for a copy_from_user(), we can only handle the
> > faults in the source address, not the destination.
> 
> At the beginning, I also thought we can only recover if the fault is in a
> user page.
> After discussion with a Mark[1], I think no matter user page or kernel page,
> as long as it is triggered by the user process, only related processes will
> be affected. According to this
> understanding, it seems that all uaccess can be recovered.
> 
> [1]https://patchwork.kernel.org/project/linux-arm-kernel/patch/20220406091311.3354723-6-tongtian...@huawei.com/

We can indeed safely skip this copy and return an error just like
pretending there was a user page fault. However, my point was more
around the "isolate the user page with hardware memory errors". If the
fault is on a kernel address, there's not much you can do about. You'll
likely trigger it later when you try to access that address (maybe it
was freed and re-allocated). Do we hope we won't get the same error
again on that kernel address?

-- 
Catalin


Re: [PATCH -next v4 4/7] arm64: add copy_{to, from}_user to machine check safe

2022-05-04 Thread Catalin Marinas
On Wed, Apr 20, 2022 at 03:04:15AM +, Tong Tiangen wrote:
> Add copy_{to, from}_user() to machine check safe.
> 
> If copy fail due to hardware memory error, only the relevant processes are
> affected, so killing the user process and isolate the user page with
> hardware memory errors is a more reasonable choice than kernel panic.

Just to make sure I understand - we can only recover if the fault is in
a user page. That is, for a copy_from_user(), we can only handle the
faults in the source address, not the destination.

> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..480cc5ac0a8d 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -25,7 +25,7 @@
>   .endm
>  
>   .macro strb1 reg, ptr, val
> - strb \reg, [\ptr], \val
> + USER_MC(9998f, strb \reg, [\ptr], \val)
>   .endm

So if I got the above correctly, why do we need an exception table entry
for the store to the kernel address?

-- 
Catalin