Watchdog not being disabled while dumping core

2010-08-23 Thread Jeremy Chadwick
It was brought to my attention that on FreeBSD with a hardware watchdog
in use (e.g. ichwd(4) + watchdogd(8)), once the kernel panics, it's
quite possible for the watchdog to fire (reboot the system) once the
panic has happened.  This issue basically inhibits the ability for a
system with a hardware watchdog in place to be able to successfully
complete doadump().

There's confirmations of this problem dating all the way back to 2005:

PR kern/82219, opened in 2005:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/82219

PR bin/145183, opened in 2010 (not sure if this is the same):
http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/145183

Confirmation that the problem still exists today (first paragraph):
http://lists.freebsd.org/pipermail/freebsd-stable/2010-August/058350.html

On Linux, it appears that they've worked around this problem by using
what's called a pretimeout (basically a way to get the watchdog to
become delayed, thus not firing during important tasks):
http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt

According to watchdog(4), it looks like the kernel setting WD_PASSIVE
immediately upon entering panic would solve the problem, but the BUGS
section indicates WD_PASSIVE hasn't been implemented (returns ENOSYS).

Thoughts on solving this dilemma?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Watchdog not being disabled while dumping core

2010-08-23 Thread Poul-Henning Kamp
In message 20100823103412.ga21...@icarus.home.lan, Jeremy Chadwick writes:

It was brought to my attention that on FreeBSD with a hardware watchdog
in use (e.g. ichwd(4) + watchdogd(8)), once the kernel panics, it's
quite possible for the watchdog to fire (reboot the system) once the
panic has happened.  This issue basically inhibits the ability for a
system with a hardware watchdog in place to be able to successfully
complete doadump().

The good news is that the watchdog hopefully gets your system back
on the air, even if the dumping hangs.

If it is decided to reset/disarm the watchdog before a dump, please
make that a sysctl tunable.



-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Watchdog not being disabled while dumping core

2010-08-23 Thread Xin LI
On Mon, Aug 23, 2010 at 3:34 AM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 PR bin/145183, opened in 2010 (not sure if this is the same):
 http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/145183

Speaking for this I think we can do it by issuing an explicit
watchdog(8) command on shutdown (like, set the timeout to several
minutes) in /etc/rc.d/watchdog's shutdown section.  This would be
trivial to implement.  Additionally, I'd personally think init(8)
should be taught about watchdog facility.

For panics, I think we should have the disk driver to pat watchdog
rather than disabling it in their write success callback?  Another
thing is that ddb should be able to disable watchdog when it's waiting
for keyboard input (or received first user input) I think.

Cheers,
-- 
Xin LI delp...@delphij.net http://www.delphij.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Watchdog not being disabled while dumping core

2010-08-23 Thread Andriy Gapon
on 23/08/2010 13:53 Poul-Henning Kamp said the following:
 In message 20100823103412.ga21...@icarus.home.lan, Jeremy Chadwick writes:
 
 It was brought to my attention that on FreeBSD with a hardware watchdog
 in use (e.g. ichwd(4) + watchdogd(8)), once the kernel panics, it's
 quite possible for the watchdog to fire (reboot the system) once the
 panic has happened.  This issue basically inhibits the ability for a
 system with a hardware watchdog in place to be able to successfully
 complete doadump().
 
 The good news is that the watchdog hopefully gets your system back
 on the air, even if the dumping hangs.
 
 If it is decided to reset/disarm the watchdog before a dump, please
 make that a sysctl tunable.

I'd rather add code to take over watchdog from watchdogd and to pat the dog 
while
dumping, perhaps some other crucial places (like right before calling reset).
This way we could ensure that system doesn't hang while dumping or in reset
routine or etc.

Another workaround is to set watchdog timeout large enough for dumping to
complete, but that increases time that system is unavailable during a 'hard' 
hang
(e.g. caused by hardware).

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Watchdog not being disabled while dumping core

2010-08-23 Thread Poul-Henning Kamp
In message 4c725dfc.8000...@icyb.net.ua, Andriy Gapon writes:
on 23/08/2010 13:53 Poul-Henning Kamp said the following:
 In message 20100823103412.ga21...@icarus.home.lan, Jeremy Chadwick writes:

Another workaround is to set watchdog timeout large enough for dumping to
complete, but that increases time that system is unavailable during a 'hard'
hang (e.g. caused by hardware).

You cannot trust the hardware to support such long timeouts.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Watchdog not being disabled while dumping core

2010-08-23 Thread Erik Trulsson
On Mon, Aug 23, 2010 at 04:07:47AM -0700, Xin LI wrote:
 On Mon, Aug 23, 2010 at 3:34 AM, Jeremy Chadwick
 free...@jdc.parodius.com wrote:
  PR bin/145183, opened in 2010 (not sure if this is the same):
  http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/145183
 
 Speaking for this I think we can do it by issuing an explicit
 watchdog(8) command on shutdown (like, set the timeout to several
 minutes) in /etc/rc.d/watchdog's shutdown section.  This would be
 trivial to implement.

No, it would not be trivial to implement (at least not if you want it
to actually work correctly.)  The reason for that is that at least some
(perhaps even most) hardware watchdog devices do not support so long
timeouts.

I know for example that the watchdog in the ixp425 CPU has a maximum
timeout of 65 seconds.  Reading the manpage for ichwd(4) it seems that
it has maximum timeout of about 37 seconds.

I suspect that other hardware watchdogs have similar limits, which
leads to the conclusion that one should not assume watchdog timeouts
longer than maybe 30 seconds to be supported.





-- 
Insert your favourite quote here.
Erik Trulsson
ertr1...@student.uu.se
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org