On Tue, Oct 07, 2025 at 05:11:52PM -0700, Ian Rogers wrote: > On Tue, Oct 7, 2025 at 3:58 PM Doug Anderson <[email protected]> wrote: > > > > Hi, > > > > On Tue, Oct 7, 2025 at 3:45 PM Ian Rogers <[email protected]> wrote: > > > > > > On Tue, Oct 7, 2025 at 2:43 PM Doug Anderson <[email protected]> > > > wrote: > > > ... > > > > The buddy watchdog was pretty much following the conventions that were > > > > already in the code: that the hardlockup detector (whether backed by > > > > perf or not) was essentially called the "nmi watchdog". There were a > > > > number of people that were involved in reviews and I don't believe > > > > suggesting creating a whole different mechanism for enabling / > > > > disabling the buddy watchdog was never suggested. > > > > > > I suspect they lacked the context that 1 in the nmi_watchdog is taken > > > to mean there's a perf event in use by the kernel with implications on > > > how group events behave. This behavior has been user > > > visible/advertised for 9 years. I don't doubt that there were good > > > intentions by PowerPC's watchdog and in the buddy watchdog patches in > > > using the file, that use will lead to spurious warnings and behaviors > > > by perf. > > > > > > My points remain: > > > 1) using multiple files regresses perf's performance; > > > 2) the file name by its meaning is wrong; > > > 3) old perf tools on new kernels won't behave as expected wrt warnings > > > and metrics because the meaning of the file has changed. > > > Using a separate file for each watchdog resolves this. It seems that > > > there wasn't enough critical mass for getting this right to have > > > mattered before, but that doesn't mean we shouldn't get it right now. > > > > Presumably your next steps then are to find someone to submit a patch > > and try to convince others on the list that this is a good idea. The > > issue with perf has been known for a while now and I haven't seen any > > patches. As I've said, I won't stand in the way if everyone else > > agrees, but given that I'm still not convinced I'm not going to author > > any patches for this myself. > > Writing >1 of: > ``` > static struct ctl_table watchdog_hardlockup_sysctl[] = { > { > .procname = "nmi_watchdog", > .data = &watchdog_hardlockup_user_enabled, > .maxlen = sizeof(int), > .mode = 0444, > .proc_handler = proc_nmi_watchdog, > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > }; > ``` > is an exercise of copy-and-paste, if you need me to do the copy and > pasting then it is okay. Can we get whether a perf event is already in use directly from the perf subsystem? There may be (or will be) other kernel users of perf_event besides the NMI watchdog. Exposing that state from the perf side would avoid coupling unrelated users through nmi_watchdog and similar features.
> > Thanks, > Ian > > > > -Doug > > -- Jinchao
