Hi, On Mon, Oct 6, 2025 at 2:30 PM Ian Rogers <[email protected]> wrote: > > On Tue, Sep 16, 2025 at 11:14 PM Jinchao Wang <[email protected]> > wrote: > > > > On Tue, Sep 16, 2025 at 10:35:46PM -0700, Namhyung Kim wrote: > > > Hello, > > > > > > On Tue, Sep 16, 2025 at 10:13:12PM -0700, Ian Rogers wrote: > > > > On Tue, Sep 16, 2025 at 6:47 PM Jinchao Wang <[email protected]> > > > > wrote: > > > > > > > > > > On Tue, Sep 16, 2025 at 05:03:48PM -0700, Ian Rogers wrote: > > > > > > On Tue, Sep 16, 2025 at 7:51 AM Jinchao Wang > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > Currently, the hard lockup detector is selected at compile time > > > > > > > via > > > > > > > Kconfig, which requires a kernel rebuild to switch > > > > > > > implementations. > > > > > > > This is inflexible, especially on systems where a perf event may > > > > > > > not > > > > > > > be available or may be needed for other tasks. > > > > > > > > > > > > > > This commit refactors the hard lockup detector to replace a rigid > > > > > > > compile-time choice with a flexible build-time and boot-time > > > > > > > solution. > > > > > > > The patch supports building the kernel with either detector > > > > > > > independently, or with both. When both are built, a new boot > > > > > > > parameter > > > > > > > `hardlockup_detector="perf|buddy"` allows the selection at boot > > > > > > > time. > > > > > > > This is a more robust and user-friendly design. > > > > > > > > > > > > > > This patch is a follow-up to the discussion on the kernel mailing > > > > > > > list > > > > > > > regarding the preference and future of the hard lockup detectors. > > > > > > > It > > > > > > > implements a flexible solution that addresses the community's > > > > > > > need to > > > > > > > select an appropriate detector at boot time. > > > > > > > > > > > > > > The core changes are: > > > > > > > - The `perf` and `buddy` watchdog implementations are separated > > > > > > > into > > > > > > > distinct functions (e.g., `watchdog_perf_hardlockup_enable`). > > > > > > > - Global function pointers are introduced > > > > > > > (`watchdog_hardlockup_enable_ptr`) > > > > > > > to serve as a single API for the entire feature. > > > > > > > - A new `hardlockup_detector=` boot parameter is added to allow > > > > > > > the > > > > > > > user to select the desired detector at boot time. > > > > > > > - The Kconfig options are simplified by removing the complex > > > > > > > `HARDLOCKUP_DETECTOR_PREFER_BUDDY` and allowing both detectors > > > > > > > to be > > > > > > > built without mutual exclusion. > > > > > > > - The weak stubs are updated to call the new function pointers, > > > > > > > centralizing the watchdog logic. > > > > > > > > > > > > What is the impact on /proc/sys/kernel/nmi_watchdog ? Is that > > > > > > enabling and disabling whatever the boot time choice was? I'm not > > > > > > sure > > > > > > why this has to be a boot time option given the ability to configure > > > > > > via /proc/sys/kernel/nmi_watchdog. > > > > > The new hardlockup_detector boot parameter and the existing > > > > > /proc/sys/kernel/nmi_watchdog file serve different purposes. > > > > > > > > > > The boot parameter selects the type of hard lockup detector (perf or > > > > > buddy). > > > > > This choice is made once at boot. > > > > > > > > > > /proc/sys/kernel/nmi_watchdog, on the other hand, is only a simple > > > > > on/off > > > > > switch for the currently selected detector. It does not change the > > > > > detector's > > > > > type. > > > > > > > > So the name "nmi_watchdog" for the buddy watchdog is wrong for fairly > > > > obvious naming reasons but also because we can't differentiate when a > > > > perf event has been taken or not - this impacts perf that is choosing > > > > not to group events in metrics because of it, reducing the metric's > > > > accuracy. We need an equivalent "buddy_watchdog" file to the > > > > "nmi_watchdog" file. If we have such a file then if I did "echo 1 > > > > > /proc/sys/kernel/nmi_watchdog" I'd expect the buddy watchdog to be > > > > disabled and the perf event one to be enabled. Similarly, if I did > > > > "echo 1 > /proc/sys/kernel/buddy_watchdog" then I would expect the > > > > perf event watchdog to be disabled and the buddy one enabled. If I did > > > > "echo 0 > /proc/sys/kernel/nmi_watchdog; echo 0 > > > > > /proc/sys/kernel/buddy_watchdog" then I'd expect neither to be > > > > enabled. I don't see why choosing the type of watchdog implementation > > > > at boot time is particularly desirable. It seems sensible to default > > > > normal people to using the buddy watchdog (more perf events, power...) > > > > and CONFIG_DEBUG_KERNEL type people to using the perf event one. As > > > > the "nmi_watchdog" file may be assumed to control the buddy watchdog, > > > > perhaps a compatibility option (where the "nmi_watchdog" file controls > > > > the buddy watchdog) is needed so that user code has time to migrate. > > > > > > Sounds good to me. For perf tools, it'd be great if we can have a run- > > > time check which watchdog is selected. > > Considering backward compatibility, I prefer to keep > > /proc/sys/kernel/nmi_watchdog and introduce a new file called > > /proc/sys/kernel/hardlockup_detector_type, which only shows the default > > string > > or the boot parameter. > > Is there code using the buddy watchdog that cares about the > /proc/sys/kernel/nmi_watchdog file? My assumption is that everything > except Android is using the NMI watchdog, so a new > /proc/sys/kernel/buddy_watchdog file doesn't impact them.
Buddy watchdog has been out there for a few years. At Google, I know it's used by everything except Android. AKA I believe it is used in Google's servers and also in ChromeOS. Both of those (presumably) enable the buddy watchdog via calling it "nmi_watchdog". It's possible that some Android phones are using the buddy watchdog too but I'm not aware of it. I don't believe Pixel is using it, though that could change in the future. IMO at this point "nmi watchdog" is simply a synonym for the hardlockup detector. That was what it looked like in the code before I started messing around and adding the buddy lockup detector and it's how it is now. While it's unfortunate that there are two names for the same thing, I don't personally think that should change at this point. FWIW, even the "buddy" watchdog relies on NMIs to actually get stack crawls on stuck cores, so NMI still means something even there. If we want to tell between the perf detector or the buddy detector we should add a separate file for it. > On Android > writing to /proc/sys/kernel/nmi_watchdog would switch from updating > the buddy watchdog enable/disable to the NMI watchdog enable/disable, > but it is a straightforward patch to make anything doing this update > the buddy_watchdog file instead. Straightforward, but you've got to go find everyone that you break by doing this. That's not something I want responsibility for. If you want to convince others this is something worthwhile and you've got someone signed up to deal with the fallout (if any) then I won't object, but it's not something I'd be in support of. > If we have to keep "nmi_watchdog" then we should deprecate it and > create an equivalent file with a better name (ie without NMI in it). > It'll be moderately annoying in perf to determine whether the NMI > watchdog is enabled by having to read two files. Again, up to you if you want to try to do this, but I'm not really in support of it. It doesn't seem terribly hard to make a new file that says which hardlockup detector is in use. If that file exists then read it. If not then you fallback to what you have today. -Doug
