Hi-- On 4/21/23 15:53, Douglas Anderson wrote: > From: Colin Cross <ccr...@android.com> > > Implement a hardlockup detector that can be enabled on SMP systems > that don't have an arch provided one or one implemented atop perf by
Is that one or more ? > using interrupts on other cpus. Each cpu will use its softlockup > hrtimer to check that the next cpu is processing hrtimer interrupts by > verifying that a counter is increasing. > > NOTE: unlike the other hard lockup detectors, the buddy one can't > easily provide a backtrace on the CPU that locked up. It relies on > some other mechanism in the system to get information about the locked > up CPUs. This could be support for NMI backtraces like [1], it could > be a mechanism for printing the PC of locked CPUs like [2], or it > could be something else. > > This style of hardlockup detector originated in some downstream > Android trees and has been rebased on / carried in ChromeOS trees for > quite a long time for use on arm and arm64 boards. Historically on > these boards we've leveraged mechanism [2] to get information about > hung CPUs, but we could move to [1]. > > NOTE: the buddy system is not really useful to enable on any > architectures that have a better mechanism. On arm64 folks have been > trying to get a better mechanism for years and there has even been > recent posts of patches adding support [3]. However, nothing about the > buddy system is tied to arm64 and several archs (even arm32, where it > was originally developed) could find it useful. > > [1] https://lore.kernel.org/r/20230419225604.21204-1-diand...@chromium.org > [2] https://issuetracker.google.com/172213129 > [3] > https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.c...@mediatek.com/ > > Signed-off-by: Colin Cross <ccr...@android.com> > Signed-off-by: Matthias Kaehlcke <m...@chromium.org> > Signed-off-by: Guenter Roeck <gro...@chromium.org> > Signed-off-by: Tzung-Bi Shih <tzun...@chromium.org> > Signed-off-by: Douglas Anderson <diand...@chromium.org> > --- > This patch has been rebased in ChromeOS kernel trees many times, and > each time someone had to do work on it they added their > Signed-off-by. I've included those here. I've also left the author as > Colin Cross since the core code is still his. > > I'll also note that the CC list is pretty giant, but that's what > get_maintainers came up with (plus a few other folks I thought would > be interested). As far as I can tell, there's no true MAINTAINER > listed for the existing watchdog code. Assuming people don't hate > this, maybe it would go through Andrew Morton's tree? > > include/linux/nmi.h | 18 ++++- > kernel/Makefile | 1 + > kernel/watchdog.c | 24 ++++-- > kernel/watchdog_buddy_cpu.c | 141 ++++++++++++++++++++++++++++++++++++ > lib/Kconfig.debug | 19 ++++- > 5 files changed, 192 insertions(+), 11 deletions(-) > create mode 100644 kernel/watchdog_buddy_cpu.c > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 39d1d93164bd..9eb86bc9f5ee 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1036,6 +1036,9 @@ config HARDLOCKUP_DETECTOR_PERF > config HARDLOCKUP_CHECK_TIMESTAMP > bool > > +config HARDLOCKUP_DETECTOR_CORE > + bool > + > # > # arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard > # lockup detector rather than the perf based detector. > @@ -1045,6 +1048,7 @@ config HARDLOCKUP_DETECTOR > depends on DEBUG_KERNEL && !S390 > depends on HAVE_HARDLOCKUP_DETECTOR_PERF || > HAVE_HARDLOCKUP_DETECTOR_ARCH > select LOCKUP_DETECTOR > + select HARDLOCKUP_DETECTOR_CORE > select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF > help > Say Y here to enable the kernel to act as a watchdog to detect > @@ -1055,9 +1059,22 @@ config HARDLOCKUP_DETECTOR > chance to run. The current stack trace is displayed upon detection > and the system will stay locked up. > > +config HARDLOCKUP_DETECTOR_BUDDY_CPU > + bool "Buddy CPU hardlockup detector" > + depends on DEBUG_KERNEL && SMP > + depends on !HARDLOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG > + depends on !S390 > + select HARDLOCKUP_DETECTOR_CORE > + select SOFTLOCKUP_DETECTOR > + help > + Say Y here to enable a hardlockup detector where CPUs check > + each other for lockup. Each cpu uses its softlockup hrtimer Preferably CPU > + to check that the next cpu is processing hrtimer interrupts by and CPU > + verifying that a counter is increasing. > + > config BOOTPARAM_HARDLOCKUP_PANIC > bool "Panic (Reboot) On Hard Lockups" > - depends on HARDLOCKUP_DETECTOR > + depends on HARDLOCKUP_DETECTOR_CORE > help > Say Y here to enable the kernel to panic on "hard lockups", > which are bugs that cause the kernel to loop in kernel -- ~Randy _______________________________________________ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport