Re: [PATCH -next v4 3/7] arm64: add support for machine check error safe

2022-05-26 Thread Mark Rutland
On Thu, May 26, 2022 at 11:36:41AM +0800, Tong Tiangen wrote:
> 
> 
> 在 2022/5/25 16:30, Mark Rutland 写道:
> > On Thu, May 19, 2022 at 02:29:54PM +0800, Tong Tiangen wrote:
> > > 
> > > 
> > > 在 2022/5/13 23:26, Mark Rutland 写道:
> > > > On Wed, Apr 20, 2022 at 03:04:14AM +, Tong Tiangen wrote:
> > > > > During the processing of arm64 kernel hardware memory 
> > > > > errors(do_sea()), if
> > > > > the errors is consumed in the kernel, the current processing is panic.
> > > > > However, it is not optimal.
> > > > > 
> > > > > Take uaccess for example, if the uaccess operation fails due to memory
> > > > > error, only the user process will be affected, kill the user process
> > > > > and isolate the user page with hardware memory errors is a better 
> > > > > choice.
> > > > 
> > > > Conceptually, I'm fine with the idea of constraining what we do for a
> > > > true uaccess, but I don't like the implementation of this at all, and I
> > > > think we first need to clean up the arm64 extable usage to clearly
> > > > distinguish a uaccess from another access.
> > > 
> > > OK,using EX_TYPE_UACCESS and this extable type could be recover, this is
> > > more reasonable.
> > 
> > Great.
> > 
> > > For EX_TYPE_UACCESS_ERR_ZERO, today we use it for kernel accesses in a
> > > couple of cases, such as
> > > get_user/futex/__user_cache_maint()/__user_swpX_asm(),
> > 
> > Those are all user accesses.
> > 
> > However, __get_kernel_nofault() and __put_kernel_nofault() use
> > EX_TYPE_UACCESS_ERR_ZERO by way of __{get,put}_mem_asm(), so we'd need to
> > refactor that code to split the user/kernel cases higher up the callchain.
> > 
> > > your suggestion is:
> > > get_user continues to use EX_TYPE_UACCESS_ERR_ZERO and the other cases use
> > > new type EX_TYPE_FIXUP_ERR_ZERO?
> > 
> > Yes, that's the rough shape. We could make the latter 
> > EX_TYPE_KACCESS_ERR_ZERO
> > to be clearly analogous to EX_TYPE_UACCESS_ERR_ZERO, and with that I 
> > susepct we
> > could remove EX_TYPE_FIXUP.
> > 
> > Thanks,
> > Mark.
> According to your suggestion, i think the definition is like this:
> 
> #define EX_TYPE_NONE0
> #define EX_TYPE_FIXUP   1--> delete
> #define EX_TYPE_BPF 2
> #define EX_TYPE_UACCESS_ERR_ZERO3
> #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD  4
> #define EX_TYPE_UACCESS   xx   --> add
> #define EX_TYPE_KACCESS_ERR_ZEROxx   --> add
> [The value defined by the macro here is temporary]

Almost; you don't need to add EX_TYPE_UACCESS here, as you can use
EX_TYPE_UACCESS_ERR_ZERO for that.

We already have:

| #define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err)\
| _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, wzr)

... and we can add:

| #define _ASM_EXTABLE_UACCESS(insn, fixup) \
| _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, wzr, wzr)


... and maybe we should use 'xzr' rather than 'wzr' for clarity.

> There are two points to modify:
> 
> 1、_get_kernel_nofault() and __put_kernel_nofault()  using
> EX_TYPE_KACCESS_ERR_ZERO, Other positions using EX_TYPE_UACCESS_ERR_ZERO
> keep unchanged.

That sounds right to me. This will require refactoring __raw_{get,put}_mem()
and __{get,put}_mem_asm().

> 2、delete EX_TYPE_FIXUP.
> 
> There is no doubt about others. As for EX_TYPE_FIXUP, I think it needs to be
> retained, _cond_extable(EX_TYPE_FIXUP) is still in use in assembler.h.

We use _cond_extable for cache maintenance uaccesses, so those should be moved
over to to EX_TYPE_UACCESS_ERR_ZERO. We can rename _cond_extable to
_cond_uaccess_extable for clarity.

That will require restructuring asm-extable.h a bit. If that turns out to be
painful I'm happy to take a look.

Thanks,
Mark.


Re: [PATCH -next v4 3/7] arm64: add support for machine check error safe

2022-05-25 Thread Mark Rutland
On Thu, May 19, 2022 at 02:29:54PM +0800, Tong Tiangen wrote:
> 
> 
> 在 2022/5/13 23:26, Mark Rutland 写道:
> > On Wed, Apr 20, 2022 at 03:04:14AM +, Tong Tiangen wrote:
> > > During the processing of arm64 kernel hardware memory errors(do_sea()), if
> > > the errors is consumed in the kernel, the current processing is panic.
> > > However, it is not optimal.
> > > 
> > > Take uaccess for example, if the uaccess operation fails due to memory
> > > error, only the user process will be affected, kill the user process
> > > and isolate the user page with hardware memory errors is a better choice.
> > 
> > Conceptually, I'm fine with the idea of constraining what we do for a
> > true uaccess, but I don't like the implementation of this at all, and I
> > think we first need to clean up the arm64 extable usage to clearly
> > distinguish a uaccess from another access.
> 
> OK,using EX_TYPE_UACCESS and this extable type could be recover, this is
> more reasonable.

Great.

> For EX_TYPE_UACCESS_ERR_ZERO, today we use it for kernel accesses in a
> couple of cases, such as
> get_user/futex/__user_cache_maint()/__user_swpX_asm(), 

Those are all user accesses.

However, __get_kernel_nofault() and __put_kernel_nofault() use
EX_TYPE_UACCESS_ERR_ZERO by way of __{get,put}_mem_asm(), so we'd need to
refactor that code to split the user/kernel cases higher up the callchain.

> your suggestion is:
> get_user continues to use EX_TYPE_UACCESS_ERR_ZERO and the other cases use
> new type EX_TYPE_FIXUP_ERR_ZERO?

Yes, that's the rough shape. We could make the latter EX_TYPE_KACCESS_ERR_ZERO
to be clearly analogous to EX_TYPE_UACCESS_ERR_ZERO, and with that I susepct we
could remove EX_TYPE_FIXUP.

Thanks,
Mark.


Re: [PATCH -next v4 3/7] arm64: add support for machine check error safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:14AM +, Tong Tiangen wrote:
> During the processing of arm64 kernel hardware memory errors(do_sea()), if
> the errors is consumed in the kernel, the current processing is panic.
> However, it is not optimal.
> 
> Take uaccess for example, if the uaccess operation fails due to memory
> error, only the user process will be affected, kill the user process
> and isolate the user page with hardware memory errors is a better choice.

Conceptually, I'm fine with the idea of constraining what we do for a
true uaccess, but I don't like the implementation of this at all, and I
think we first need to clean up the arm64 extable usage to clearly
distinguish a uaccess from another access.

> This patch only enable machine error check framework, it add exception
> fixup before kernel panic in do_sea() and only limit the consumption of
> hardware memory errors in kernel mode triggered by user mode processes.
> If fixup successful, panic can be avoided.
> 
> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
> 
> Also add copy_mc_to_user() in include/linux/uaccess.h, this helper is
> called when CONFIG_ARCH_HAS_COPOY_MC is open.
> 
> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/Kconfig   |  1 +
>  arch/arm64/include/asm/extable.h |  1 +
>  arch/arm64/mm/extable.c  | 17 +
>  arch/arm64/mm/fault.c| 27 ++-
>  include/linux/uaccess.h  |  9 +
>  5 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index d9325dd95eba..012e38309955 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,6 +19,7 @@ config ARM64
>   select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>   select ARCH_HAS_CACHE_LINE_SIZE
> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>   select ARCH_HAS_CURRENT_STACK_POINTER
>   select ARCH_HAS_DEBUG_VIRTUAL
>   select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/arch/arm64/include/asm/extable.h 
> b/arch/arm64/include/asm/extable.h
> index 72b0e71cc3de..f80ebd0addfd 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -46,4 +46,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>  #endif /* !CONFIG_BPF_JIT */
>  
>  bool fixup_exception(struct pt_regs *regs);
> +bool fixup_exception_mc(struct pt_regs *regs);
>  #endif
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 489455309695..4f0083a550d4 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -9,6 +9,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  static inline unsigned long
>  get_ex_fixup(const struct exception_table_entry *ex)
> @@ -84,3 +85,19 @@ bool fixup_exception(struct pt_regs *regs)
>  
>   BUG();
>  }
> +
> +bool fixup_exception_mc(struct pt_regs *regs)
> +{
> + const struct exception_table_entry *ex;
> +
> + ex = search_exception_tables(instruction_pointer(regs));
> + if (!ex)
> + return false;
> +
> + /*
> +  * This is not complete, More Machine check safe extable type can
> +  * be processed here.
> +  */
> +
> + return false;
> +}

This is at best misnamed; It doesn't actually apply the fixup, it just
searches for one.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 77341b160aca..a9e6fb1999d1 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -695,6 +695,29 @@ static int do_bad(unsigned long far, unsigned int esr, 
> struct pt_regs *regs)
>   return 1; /* "fault" */
>  }
>  
> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
> +  struct pt_regs *regs, int sig, int code)
> +{
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> + return false;
> +
> + if (user_mode(regs) || !current->mm)
> + return false;
> +
> + if (apei_claim_sea(regs) < 0)
> + return false;
> +
> + if (!fixup_exception_mc(regs))
> + return false;
> +
> + set_thread_esr(0, esr);
> +
> + arm64_force_sig_fault(sig, code, addr,
> + "Uncorrected hardware memory error in kernel-access\n");
> +
> + return true;
> +}
> +
>  static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>  {
>   const struct fault_info *inf;
> @@ -720,7 +743,9 @@ static int do_sea(unsigned long far, unsigned int esr, 
> struct pt_regs *regs)
>*/
>   siaddr  = untagged_addr(far);
>   }
> - arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> +
> + if (!arm64_do_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
> + arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, 
> esr);
>  
>   return 0;
>  }
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index