Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Fri, May 06, 2011 at 01:25:35AM +0930, Arthur Marsh wrote: > > > Don Zickus wrote, on 06/05/11 01:18: > >On Thu, May 05, 2011 at 01:41:08AM +0930, Arthur Marsh wrote: > >> > >> > >>Hi, with kernel 2.6.39-rc5-git2 compiled for Pentium-II, compiled > >>both plain from kernel.org and with that patch and command line: > >> > >>BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4 > >>root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic > >> > >>In both cases I don't see soft lock-up errors after running "perf top". > > > >Hi Arthur, > > > >I guess I am a little confused. Are you saying you don't see the problem > >on a vanilla upstream kernel now? > > > >Cheers, > >Don > > > > Correct, both the vanilla 2.6.39-rc5-git4 and with your patch don't > show the problem. Ok. My patch was just to help isolate the problem. > > I'm happy to re-test anything, but as far as I'm concerned the > problem is now solved. If vanilla works, then I don't think there is a need to re-test anything. Thanks for the feedback. Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Thu, May 05, 2011 at 01:41:08AM +0930, Arthur Marsh wrote: > > > Hi, with kernel 2.6.39-rc5-git2 compiled for Pentium-II, compiled > both plain from kernel.org and with that patch and command line: > > BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4 > root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic > > In both cases I don't see soft lock-up errors after running "perf top". Hi Arthur, I guess I am a little confused. Are you saying you don't see the problem on a vanilla upstream kernel now? Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote: > > > > > > Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down > and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors. > > After adding 'nmi_watchdog=0' I could cleanly shutdown and have not > seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors. I realized nmi_watchdog=0 also disabled the softlockup detector too. What happens if you apply this patch and run _without_ the nmi_watchdog=0 parameter? Do you still get soft lockups? Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 140dce7..2527d10 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -421,7 +421,7 @@ static int watchdog_enable(int cpu) int err = 0; /* enable the perf event */ - err = watchdog_nmi_enable(cpu); + //err = watchdog_nmi_enable(cpu); /* Regardless of err above, fall through and start softlockup */ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Tue, May 03, 2011 at 04:45:19AM +0930, Arthur Marsh wrote: > >Could you boot with nmi_watchdog=0 and when you get a shell prompt, run > >something like > > > >perf top OR > >perf record grep -r foo /* > > > >I am curious to know if this is lockup detector related or the perf > >subystem related. > > > >Cheers, > >Don > > > > OK, with 2.6.39-rc5-git4: > > $ cat /proc/cmdline > > BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4 > root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic > nmi_watchdog=0 > > Once I'd installed linux-tools-2.6.39, perf ran fine and I've > attached the perf.data output. Thanks. I assume no softlockup warnings showed up after running the perf command? Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Tue, May 03, 2011 at 12:49:55AM +0930, Arthur Marsh wrote: > > > Don Zickus wrote, on 02/05/11 22:29: > >On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote: > >> > >> > >> > >>Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down > >>and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors. > >> > >>After adding 'nmi_watchdog=0' I could cleanly shutdown and have not > >>seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors. > > > >Ok. That's good. Can I see the dmesg log when you boot with out > >'nmi_watchdog=0'. I want to see the failure nmi watchdog spits out on > >boot and what subsystem perf thinks it found (should be none). > > > >This sounds like the softlock timer started and is hanging for some reason > >because the nmi watchdog failed to start. I thought those issues were > >fixed a while ago, maybe not. > > > >Cheers, > >Don > > > > I hope that it is alright providing the dmesg log for > 2.6.39-rc5-git4, which I've attached. Interesting, I had just assumed a PII machine had no lapic and thus no perf counters. Guess I was wrong. It uses the p6 perf event subsystem and the lockup detector happily registers with it. This is odd because the commit you bisected down to, dealt with failure cases if the nmi watchdog couldn't register and initialize itself. But your dmesg clearly shows success there, so I am not sure why your machine would react negatively towards that commit. I do have to say, I have never really tested the lockup detector code using the p6 perf stuff, so I just assumed it worked correctly. I am wondering if the nmi or timer logic takes a long time on your machine that it makes the userspace tasks seem like the are running for a very long time. Could you boot with nmi_watchdog=0 and when you get a shell prompt, run something like perf top OR perf record grep -r foo /* I am curious to know if this is lockup detector related or the perf subystem related. Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote: > > > > Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down > and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors. > > After adding 'nmi_watchdog=0' I could cleanly shutdown and have not > seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors. Ok. That's good. Can I see the dmesg log when you boot with out 'nmi_watchdog=0'. I want to see the failure nmi watchdog spits out on boot and what subsystem perf thinks it found (should be none). This sounds like the softlock timer started and is hanging for some reason because the nmi watchdog failed to start. I thought those issues were fixed a while ago, maybe not. Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"
On Sat, Apr 30, 2011 at 07:06:54AM +0930, Arthur Marsh wrote: > Hi, I'd previously (2011-03-28) run into an issue with commit > f99a99330f85a84c346ddeb4adc72dbfad9b9e3e > kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events > > but wasn't sure that this commit was the cause of a problem or > simply revealing symptoms. I have since re-run a git-bisect and the > same commit was implicated. > > On on old Pentium-II-266 machine with 440BX chipset I would get > numerous "BUG: soft lockup - CPU#0 stuck for 63s!" type errors in > dmesg with different applications being reported and even a command > like: Does the problem go away if you boot with 'nmi_watchdog=0'? Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#599368: [PATCH v2] watchdog: Improve initialisation error message and documentation
On Sun, Jan 02, 2011 at 11:02:42PM +, Ben Hutchings wrote: > The error message 'NMI watchdog failed to create perf event...' does > not make it clear that this is a fatal error for the watchdog. It > also currently prints the error value as a pointer, rather than > extracting the error code with PTR_ERR(). Fix that. > > Add a note to the description of the 'nowatchdog' kernel parameter to > associate it with this message. > > Reported-by: Cesare Leonardi > Signed-off-by: Ben Hutchings Looks good, thanks. I'll add it to my queue. Cheers, Don -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org