On Fri, Mar 07, 2014 at 01:15:35PM -0800, H. Peter Anvin wrote: > On 03/07/2014 11:39 AM, Don Zickus wrote: > > A customer generated an external NMI using their iLO to test kdump worked. > > Unfortunately, the machine hung. Disabling the nmi_watchdog made things > > work. > > > > I speculated the external NMI fired, caused the machine to panic (as > > expected) > > and the perf NMI from the watchdog came in and was latched. My guess was > > this > > somehow caused the hang. > > > > ... as any other unexpected exception would. > > > > > I also do not fully understand why the latched NMI is not happening > > immediately > > after the load idt call or why it comes after a page fault (the > > early_make_pgtable). Further adding to my confusion is why the early printk > > magic didn't dump a stack as I believe I had that setup on my commandline. > > But I figured I would just report what I have observed. > > > > If the kdump is initiated from NMI context, I'm wondering if it might be > possible that we haven't actually executed an IRET until this one > happens, and the IRET re-enables NMI.
Ah makes sense then. > > > My testing and debugging were based off a 3.10 kernel (RHEL-7) but has > > included > > Seiji's tracepoint cleanups to arch/x86/kernel/head_64.S|head64.c. Not much > > has changed upstream here. Also 3.14-rc4 still has the same hang. > > > > Signed-off-by: Don Zickus <dzic...@redhat.com> > > We really shouldn't be doing the fixup lookup for NMI, either. Probably > it makes more sense to just IRET on NMI until we have the real interrupt > vectors set up, but it needs to be done a little earlier. > > How does this patch work for you? I tested it on 64 bit and it works good. Thanks! Cheers, Don > > -hpa > > diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S > index 81ba276..d2a2159 100644 > --- a/arch/x86/kernel/head_32.S > +++ b/arch/x86/kernel/head_32.S > @@ -544,6 +544,10 @@ ENDPROC(early_idt_handlers) > /* This is global to keep gas from relaxing the jumps */ > ENTRY(early_idt_handler) > cld > + > + cmpl $X86_TRAP_NMI,(%esp) > + je is_nmi # Ignore NMI > + > cmpl $2,%ss:early_recursion_flag > je hlt_loop > incl %ss:early_recursion_flag > @@ -594,8 +598,9 @@ ex_entry: > pop %edx > pop %ecx > pop %eax > - addl $8,%esp /* drop vector number and error code */ > decl %ss:early_recursion_flag > +is_nmi: > + addl $8,%esp /* drop vector number and error code */ > iret > ENDPROC(early_idt_handler) > > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S > index e1aabdb..33f36c7 100644 > --- a/arch/x86/kernel/head_64.S > +++ b/arch/x86/kernel/head_64.S > @@ -343,6 +343,9 @@ early_idt_handlers: > ENTRY(early_idt_handler) > cld > > + cmpl $X86_TRAP_NMI,(%rsp) > + je is_nmi # Ignore NMI > + > cmpl $2,early_recursion_flag(%rip) > jz 1f > incl early_recursion_flag(%rip) > @@ -405,8 +408,9 @@ ENTRY(early_idt_handler) > popq %rdx > popq %rcx > popq %rax > - addq $16,%rsp # drop vector number and error code > decl early_recursion_flag(%rip) > +is_nmi: > + addq $16,%rsp # drop vector number and error code > INTERRUPT_RETURN > ENDPROC(early_idt_handler) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/