On 02/28/2017 07:30 AM, Nicholas Piggin wrote: > A synchronous machine check is an exception raised by the attempt to > execute the current instruction. If the error can't be corrected, it > can make sense to SIGBUS the currently running process. > > In other cases, the error condition is not related to the current > instruction, so killing the current process is not the right thing to > do. > > Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no > practical change. It will be used to handle POWER9 asynchronous > machine checks. > > Signed-off-by: Nicholas Piggin <npig...@gmail.com> > --- > arch/powerpc/platforms/powernv/opal.c | 21 ++++++--------------- > 1 file changed, 6 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/opal.c > b/arch/powerpc/platforms/powernv/opal.c > index 86d9fde93c17..e0f856bfbfe8 100644 > --- a/arch/powerpc/platforms/powernv/opal.c > +++ b/arch/powerpc/platforms/powernv/opal.c > @@ -395,7 +395,6 @@ static int opal_recover_mce(struct pt_regs *regs, > struct machine_check_event *evt) > { > int recovered = 0; > - uint64_t ea = get_mce_fault_addr(evt); > > if (!(regs->msr & MSR_RI)) { > /* If MSR_RI isn't set, we cannot recover */ > @@ -404,26 +403,18 @@ static int opal_recover_mce(struct pt_regs *regs, > } else if (evt->disposition == MCE_DISPOSITION_RECOVERED) { > /* Platform corrected itself */ > recovered = 1; > - } else if (ea && !is_kernel_addr(ea)) { > + } else if (evt->severity == MCE_SEV_FATAL) { > + /* Fatal machine check */ > + pr_err("Machine check interrupt is fatal\n"); > + recovered = 0;
Setting recovered = 0 would trigger kernel panic. Should we panic the kernel for asynchronous errors ? > + } else if ((evt->severity == MCE_SEV_ERROR_SYNC) && > + (user_mode(regs) && !is_global_init(current))) { > /* > - * Faulting address is not in kernel text. We should be fine. > - * We need to find which process uses this address. > * For now, kill the task if we have received exception when > * in userspace. > * > * TODO: Queue up this address for hwpoisioning later. > */ > - if (user_mode(regs) && !is_global_init(current)) { > - _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); > - recovered = 1; > - } else > - recovered = 0; > - } else if (user_mode(regs) && !is_global_init(current) && > - evt->severity == MCE_SEV_ERROR_SYNC) { > - /* > - * If we have received a synchronous error when in userspace > - * kill the task. > - */ > _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); > recovered = 1; > } >