Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-05 Thread Don Zickus
On Fri, May 06, 2011 at 01:25:35AM +0930, Arthur Marsh wrote:
> 
> 
> Don Zickus wrote, on 06/05/11 01:18:
> >On Thu, May 05, 2011 at 01:41:08AM +0930, Arthur Marsh wrote:
> >>
> >>
> >>Hi, with kernel 2.6.39-rc5-git2 compiled for Pentium-II, compiled
> >>both plain from kernel.org and with that patch and command line:
> >>
> >>BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4
> >>root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic
> >>
> >>In both cases I don't see soft lock-up errors after running "perf top".
> >
> >Hi Arthur,
> >
> >I guess I am a little confused.  Are you saying you don't see the problem
> >on a vanilla upstream kernel now?
> >
> >Cheers,
> >Don
> >
> 
> Correct, both the vanilla 2.6.39-rc5-git4 and with your patch don't
> show the problem.

Ok.  My patch was just to help isolate the problem.

> 
> I'm happy to re-test anything, but as far as I'm concerned the
> problem is now solved.

If vanilla works, then I don't think there is a need to re-test anything.

Thanks for the feedback.

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-05 Thread Don Zickus
On Thu, May 05, 2011 at 01:41:08AM +0930, Arthur Marsh wrote:
> 
> 
> Hi, with kernel 2.6.39-rc5-git2 compiled for Pentium-II, compiled
> both plain from kernel.org and with that patch and command line:
> 
> BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4
> root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic
> 
> In both cases I don't see soft lock-up errors after running "perf top".

Hi Arthur,

I guess I am a little confused.  Are you saying you don't see the problem
on a vanilla upstream kernel now?

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-03 Thread Don Zickus
On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote:
> >
> >
> 
> Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down
> and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors.
> 
> After adding 'nmi_watchdog=0' I could cleanly shutdown and have not
> seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors.

I realized nmi_watchdog=0 also disabled the softlockup detector too.
What happens if you apply this patch and run _without_ the nmi_watchdog=0
parameter?

Do you still get soft lockups?

Cheers,
Don


diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 140dce7..2527d10 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -421,7 +421,7 @@ static int watchdog_enable(int cpu)
int err = 0;
 
/* enable the perf event */
-   err = watchdog_nmi_enable(cpu);
+   //err = watchdog_nmi_enable(cpu);
 
/* Regardless of err above, fall through and start softlockup */
 



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-03 Thread Don Zickus
On Tue, May 03, 2011 at 04:45:19AM +0930, Arthur Marsh wrote:
> >Could you boot with nmi_watchdog=0 and when you get a shell prompt, run
> >something like
> >
> >perf top OR
> >perf record grep -r foo /*
> >
> >I am curious to know if this is lockup detector related or the perf
> >subystem related.
> >
> >Cheers,
> >Don
> >
> 
> OK, with 2.6.39-rc5-git4:
> 
> $ cat /proc/cmdline
> 
> BOOT_IMAGE=/vmlinuz-2.6.39-rc5-git4
> root=UUID=96c96a61-8615-4715-86d0-09cb8c62638c ro lapic
> nmi_watchdog=0
> 
> Once I'd installed linux-tools-2.6.39, perf ran fine and I've
> attached the perf.data output.

Thanks.  I assume no softlockup warnings showed up after running the perf
command?

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-02 Thread Don Zickus
On Tue, May 03, 2011 at 12:49:55AM +0930, Arthur Marsh wrote:
> 
> 
> Don Zickus wrote, on 02/05/11 22:29:
> >On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote:
> >>
> >>
> >>
> >>Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down
> >>and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors.
> >>
> >>After adding 'nmi_watchdog=0' I could cleanly shutdown and have not
> >>seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors.
> >
> >Ok.  That's good.  Can I see the dmesg log when you boot with out
> >'nmi_watchdog=0'.  I want to see the failure nmi watchdog spits out on
> >boot and what subsystem perf thinks it found (should be none).
> >
> >This sounds like the softlock timer started and is hanging for some reason
> >because the nmi watchdog failed to start.  I thought those issues were
> >fixed a while ago, maybe not.
> >
> >Cheers,
> >Don
> >
> 
> I hope that it is alright providing the dmesg log for
> 2.6.39-rc5-git4, which I've attached.

Interesting, I had just assumed a PII machine had no lapic and thus no
perf counters.  Guess I was wrong.  It uses the p6 perf event subsystem
and the lockup detector happily registers with it.

This is odd because the commit you bisected down to, dealt with failure
cases if the nmi watchdog couldn't register and initialize itself.  But
your dmesg clearly shows success there, so I am not sure why your machine
would react negatively towards that commit.

I do have to say, I have never really tested the lockup detector code
using the p6 perf stuff, so I just assumed it worked correctly.  I am
wondering if the nmi or timer logic takes a long time on your machine that
it makes the userspace tasks seem like the are running for a very long
time.

Could you boot with nmi_watchdog=0 and when you get a shell prompt, run
something like

perf top OR
perf record grep -r foo /*

I am curious to know if this is lockup detector related or the perf
subystem related.

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-05-02 Thread Don Zickus
On Sat, Apr 30, 2011 at 01:06:07PM +0930, Arthur Marsh wrote:
> 
> 
> 
> Yes, with plain 2.6.39-rc5-git2 I had problems with shutting down
> and with "BUG: soft lockup - CPU#0 stuck for 63s!" errors.
> 
> After adding 'nmi_watchdog=0' I could cleanly shutdown and have not
> seen any more "BUG: soft lockup - CPU#0 stuck for 63s!" errors.

Ok.  That's good.  Can I see the dmesg log when you boot with out
'nmi_watchdog=0'.  I want to see the failure nmi watchdog spits out on
boot and what subsystem perf thinks it found (should be none).

This sounds like the softlock timer started and is hanging for some reason
because the nmi watchdog failed to start.  I thought those issues were
fixed a while ago, maybe not.

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#624129: problems with kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events e.g. "BUG: soft lockup - CPU#0 stuck for 63s!"

2011-04-29 Thread Don Zickus
On Sat, Apr 30, 2011 at 07:06:54AM +0930, Arthur Marsh wrote:
> Hi, I'd previously (2011-03-28) run into an issue with commit
> f99a99330f85a84c346ddeb4adc72dbfad9b9e3e
> kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events
> 
> but wasn't sure that this commit was the cause of a problem or
> simply revealing symptoms. I have since re-run a git-bisect and the
> same commit was implicated.
> 
> On on old Pentium-II-266 machine with 440BX chipset I would get
> numerous "BUG: soft lockup - CPU#0 stuck for 63s!" type errors in
> dmesg with different applications being reported and even a command
> like:

Does the problem go away if you boot with 'nmi_watchdog=0'?

Cheers,
Don




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#599368: [PATCH v2] watchdog: Improve initialisation error message and documentation

2011-01-03 Thread Don Zickus
On Sun, Jan 02, 2011 at 11:02:42PM +, Ben Hutchings wrote:
> The error message 'NMI watchdog failed to create perf event...'  does
> not make it clear that this is a fatal error for the watchdog.  It
> also currently prints the error value as a pointer, rather than
> extracting the error code with PTR_ERR().  Fix that.
> 
> Add a note to the description of the 'nowatchdog' kernel parameter to
> associate it with this message.
> 
> Reported-by: Cesare Leonardi 
> Signed-off-by: Ben Hutchings 

Looks good, thanks.  I'll add it to my queue.

Cheers,
Don



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org