On Tue, 9 Oct 2018 06:46:30 +0200 Christophe LEROY <christophe.le...@c-s.fr> wrote:
> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit : > > On Mon, 8 Oct 2018 17:39:11 +0200 > > Christophe LEROY <christophe.le...@c-s.fr> wrote: > > > >> Hi Nick, > >> > >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit : > >>> Use nmi_enter similarly to system reset interrupts. This uses NMI > >>> printk NMI buffers and turns off various debugging facilities that > >>> helps avoid tripping on ourselves or other CPUs. > >>> > >>> Signed-off-by: Nicholas Piggin <npig...@gmail.com> > >>> --- > >>> arch/powerpc/kernel/traps.c | 9 ++++++--- > >>> 1 file changed, 6 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c > >>> index 2849c4f50324..6d31f9d7c333 100644 > >>> --- a/arch/powerpc/kernel/traps.c > >>> +++ b/arch/powerpc/kernel/traps.c > >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs) > >>> > >>> void machine_check_exception(struct pt_regs *regs) > >>> { > >>> - enum ctx_state prev_state = exception_enter(); > >>> int recover = 0; > >>> + bool nested = in_nmi(); > >>> + if (!nested) > >>> + nmi_enter(); > >> > >> This alters preempt_count, then when die() is called > >> in_interrupt() returns true allthough the trap didn't happen in > >> interrupt, so oops_end() panics for "fatal exception in interrupt" > >> instead of gently sending SIGBUS the faulting app. > > > > Thanks for tracking that down. > > > >> Any idea on how to fix this ? > > > > I would say we have to deliver the sigbus by hand. > > > > if ((user_mode(regs))) > > _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip); > > else > > die("Machine check", regs, SIGBUS); > > > > And what about all the other things done by 'die()' ? > > And what if it is a kernel thread ? > > In one of my boards, I have a kernel thread regularly checking the HW, > and if it gets a machine check I expect it to gently stop and the die > notification to be delivered to all registered notifiers. > > Until before this patch, it was working well. I guess the alternative is we could check regs->trap for machine check in the die test. Complication is having to account for MCE in an interrupt handler. if (in_interrupt()) { if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) panic("Fatal exception in interrupt"); } Something like that might work for you? We needs a ppc64 macro for the MCE, and can probably add something like in_nmi_from_interrupt() for the second part of the test. Thanks, Nick