Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
On Mon, 8 Oct 2018 17:39:11 +0200
Christophe LEROY <christophe.le...@c-s.fr> wrote:

Hi Nick,

Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin <npig...@gmail.com>
---
   arch/powerpc/kernel/traps.c | 9 ++++++---
   1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2849c4f50324..6d31f9d7c333 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
void machine_check_exception(struct pt_regs *regs)
   {
-       enum ctx_state prev_state = exception_enter();
        int recover = 0;
+       bool nested = in_nmi();
+       if (!nested)
+               nmi_enter();

This alters preempt_count, then when die() is called
in_interrupt() returns true allthough the trap didn't happen in
interrupt, so oops_end() panics for "fatal exception in interrupt"
instead of gently sending SIGBUS the faulting app.

Thanks for tracking that down.

Any idea on how to fix this ?

I would say we have to deliver the sigbus by hand.

     if ((user_mode(regs)))
         _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
     else
         die("Machine check", regs, SIGBUS);


And what about all the other things done by 'die()' ?

And what if it is a kernel thread ?

In one of my boards, I have a kernel thread regularly checking the HW, and if it gets a machine check I expect it to gently stop and the die notification to be delivered to all registered notifiers.

Until before this patch, it was working well.

Christophe

Reply via email to