Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-10 Thread kbuild test robot
Hi Hidehiro,

[auto build test ERROR on v4.4-rc4]
[also build test ERROR on next-20151209]
[cannot apply to tip/x86/core]

url:
https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095254
config: x86_64-randconfig-s4-12101030 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/built-in.o: In function `do_nmi':
>> (.text+0x7339): undefined reference to `run_crash_ipi_callback'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-09 Thread Hidehiro Kawai
Now, multiple CPUs can receive external NMI simultaneously by
specifying "apic_extnmi=all" as an boot option.  When we take a
crash dump by using external NMI with this option, we fail to save
register values into the crash dump.  This happens as follows:

  CPU 0  CPU 1
     =
  receive an external NMI
  default_do_nmi()   receive an external NMI
spin_lock(_reason_lock)  default_do_nmi()
io_check_error()   spin_lock(_reason_lock)
  panic()busy loop
  ...
kdump_nmi_shootdown_cpus()
  issue NMI IPI ---> blocked until IRET
 busy loop...

  Here, since CPU 1 is in NMI context, additional NMI from CPU 0
  is blocked until CPU 1 executes IRET.  However, CPU 1 never
  executes IRET, so the NMI is not handled and the callback function
  to save registers is never called.

To solve this issue, we check if the IPI for crash dumping was
issued while waiting for nmi_reason_lock to be released, and if so,
call its callback function directly.  If the IPI is not issued (e.g.
kdump is disabled), the actual behavior doesn't change.

V6:
- Separated from the former patch `panic/x86: Allow cpus to save
  registers even if they are looping in NMI context'
- Fix comments
- Remove unneeded UP version of poll_crash_ipi_and_calback
- Rename poll_crash_ipi_and_callback to run_crash_ipi_callback

Signed-off-by: Hidehiro Kawai 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Cc: Michal Hocko 
---
 arch/x86/include/asm/reboot.h |1 +
 arch/x86/kernel/nmi.c |   11 ++-
 arch/x86/kernel/reboot.c  |   18 +-
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index a82c4f1..2cb1cc2 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
 
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+void run_crash_ipi_callback(struct pt_regs *regs);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5e00de7..cbfa0b5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
}
 
/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-   raw_spin_lock(_reason_lock);
+   /*
+* Another CPU may be processing panic routines while holding
+* nmi_reason_lock.  Check if the CPU issued the IPI for crash
+* dumping, and if so, call its callback directly.  If there is
+* no CPU preparing crash dump, we simply loop here without doing
+* special things.
+*/
+   while (!raw_spin_trylock(_reason_lock))
+   run_crash_ipi_callback(regs);
reason = x86_platform.get_nmi_reason();
 
if (reason & NMI_REASON_MASK) {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 1da1302..60a216b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -793,17 +793,25 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
/* Leave the nmi callback set */
 }
 
+/*
+ * Wait for the crash dumping IPI to be issued, and then call its callback
+ * directly.  This function is used when we have already been in NMI handler.
+ */
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+   if (crash_ipi_issued)
+   crash_nmi_callback(0, regs); /* Don't return */
+}
+
 /* Override the weak function in kernel/panic.c */
 void nmi_panic_self_stop(struct pt_regs *regs)
 {
while (1) {
/*
-* Wait for the crash dumping IPI to be issued, and then
-* call its callback directly.
+* If there is no CPU preparing crash dump, we simply loop
+* here without doing special things.
 */
-   if (READ_ONCE(crash_ipi_issued))
-   crash_nmi_callback(0, regs); /* Don't return */
-
+   run_crash_ipi_callback(regs);
cpu_relax();
}
 }



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-09 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi Steven,

> From: Steven Rostedt [mailto:rost...@goodmis.org]
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

It seems that poll_crash_ipi_and_callback (now renamed to 
run_crash_ipi_callback)
is referred for UP if CONFIG_DEBUG_SPINLOCK=y, and it causes a build error
as below.  run_crash_ipi_callback refers crash_ipi_issued and 
crash_nmi_callback,
which are defined only if CONFIG_SMP=y.  So we need to defined it separately
for SMP and UP.

I'll resend this patch later.

> Hi Hidehiro,
> 
> [auto build test ERROR on v4.4-rc4]
> [also build test ERROR on next-20151209]
> [cannot apply to tip/x86/core]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095
> 254
> config: x86_64-randconfig-s4-12101030 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/built-in.o: In function `do_nmi':
> >> (.text+0x7339): undefined reference to `run_crash_ipi_callback'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec