On Tue, 2007-09-11 at 12:36 -0700, Daniel Walker wrote: > On Tue, 2007-09-11 at 13:16 -0600, David Bahi wrote: ...snip... > > I guess I'm confused what's happening .. It sounds like with > nmi_watchdog=1 , the system hangs and the watchdog catches it. With > nmi_watchdog=2 the system doesn't hang, and the watchdog doesn't catch > anything (assuming it's working)?
the LTP yeild_sched/1-1.test lockup only happens with the -rt patch
applied.
trying to use nmi_watchdog=1 with -rt and it reports LOCKUP at *boot*.
no chance to log in... no chance to run LTP test. the console logs
attached at the beginning of this thread show this for latest -rt on
both 22 and 23-rc
trying to use nmi_watchdog=2 with -rt - does not LOCKUP at boot. but it
also does not appear to be working. log in and run LTP test and system
hardlocks again and the watchdog never detects LOCKUP -> panic ->
crashdump (which is all set up with the hope to discover why the LTP
test is hanging :)
...snip...
> > For x86-64, the needed APIC is always compiled in, and the NMI
> > watchdog is always enabled with I/O-APIC mode (nmi_watchdog=1).
> > Currently, local APIC mode (nmi_watchdog=2) does not work on x86-64.
> >
> > Is this no longer true? My experience with nmi_watchdog=2 and this LTP
> > openposix sched_yeild 1-1.test is that this test hardlocks the host and
> > that no watchdog is triggered with this setting.
>
> It looks like it should work .. I'd be surprised if it didn't work.. You
> can check if it's ticking in /proc/interrupts under NMI (it stops when
> the system is idle tho)
well... i've got counts - but like i said earlier it doesn't catch the
LOCKUP when the LTP test hardlocks the host.
luge:~ # cat /proc/interrupts |grep -Ee 'NMI|CPU'
CPU0 CPU1 CPU2 CPU3 \
NMI: 1559 692 608 544 \
CPU4 CPU5 CPU6 CPU7
783 273 213 155
ah, but fortunately a serial console BREAK followed by 'c' dumps....
db
signature.asc
Description: PGP signature
