The hard lockup detector uses a PMU event as a periodic NMI to
detect if we are stuck (where stuck means no timer interrupts have
occurred).

Ben's rework of the ppc64 soft disable code has made ppc64 PMU
exceptions a partial NMI. They can get disabled if an external interrupt
comes in, but otherwise PMU interrupts will fire in interrupt disabled
regions.

I wrote a kernel module to test this patch and noticed we sometimes
missed hard lockup warnings. The RCU code detected the stall first and
issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external
interrupt and that will hard disable interrupts, preventing the hard
lockup detector from going off.

If I reduced the hard lockup threshold to 5 seconds:

echo 5 > /proc/sys/kernel/watchdog_thresh

Then it would beat the RCU code in detecting a stall and get a
correct backtrace out.

Another downside is that our PMCs can only count to 2^31, so even when
we ask for 10 seconds of processor cycles, we end up taking a couple
of PMU exceptions a second.

Signed-off-by: Anton Blanchard <an...@samba.org>
---

Index: b/arch/powerpc/Kconfig
===================================================================
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -145,6 +145,7 @@ config PPC
        select HAVE_IRQ_EXIT_ON_IRQ_STACK
        select ARCH_USE_CMPXCHG_LOCKREF if PPC64
        select HAVE_ARCH_AUDITSYSCALL
+       select HAVE_PERF_EVENTS_NMI if PPC64
 
 config GENERIC_CSUM
        def_bool CPU_LITTLE_ENDIAN
Index: b/arch/powerpc/include/asm/nmi.h
===================================================================
--- /dev/null
+++ b/arch/powerpc/include/asm/nmi.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_NMI_H
+#define _ASM_NMI_H
+
+#endif /* _ASM_NMI_H */
Index: b/arch/powerpc/kernel/setup_64.c
===================================================================
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -796,3 +796,10 @@ unsigned long memory_block_size_bytes(vo
 struct ppc_pci_io ppc_pci_io;
 EXPORT_SYMBOL(ppc_pci_io);
 #endif
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+       return ppc_proc_freq * watchdog_thresh;
+}
+#endif
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to