Re: How to troubleshoot solid freeze-up?

2005-10-03 Thread Robert Watson


On Sun, 2 Oct 2005, David S. Madole wrote:

I'm looking for some tips on how to troubleshoot a possible driver 
problem. Here is the scenario:


1. Using a Pentium II 333Mhz mobile processor, 82443BX motherboard, and 
Intel i82559 NIC (fxp driver).


2. A combination of heavy disk I/O, high CPU utilization, and high 
network traffic causes a solid machine freeze-up sometime between 10 
minutes and 3 hours of running.


3. Replacing the NIC with a DP83815-based card (sis driver) seems to 
solve the problem. I have run the problem load for up to 8 hours without 
issue on this NIC.


4. The problem is reproducable on multiple identical machines with 
multiple identical NICs. Also reproducable on an i82558 NIC integrated 
on the motherboard.


How can I go about collecting useful information to troubleshoot this 
when the machine locks solid? How can a get a core under this scenario?


Switching to another NIC permanently is not a great solution because 
this is a semi-embedded application and I need to use the NIC on the 
motherboard.


The normal method is to use a break signal to get into the debugger. 
Depending on your hardware and software configuration, this may be more or 
less easy.  First, you'll need to configure options BREAK_TO_DEBUGGER into 
your kernel.  You can break into the debugger in one of three ways:


(1) Ctrl-alt-esc on a syscons console.  Note that because the syscons
console is under the Giant lock, reliability of this mechanism to get
into the debugger on FreeBSD 5.x is reduced.  It is quite a bit better
in 6.x, and will continue to get better as the use of Giant is
reduced.  If this doesn't work for you, try (2).

(2) Serial break on the first serial console port.  Because the sio driver
uses a fast interrupt handler, this is quite a reliable way to get
into the debugger unless interrupts are disabled, in which case the
serial port can't interrupt the CPU to drop into the debugger.  If
this doesn't work for you, try (3).

(3) Break to debugger using an NMI.  Some hardware, especially evaluation
hardware, comes with an NMI button, frob, or other way to initiate a
drop to the debugger despite interrupts being disabled.  Hardware
watchdogs are often also able to generate an NMI.

I find that, except in pretty exceptional circumstances, (2) works quite 
well.  You can find a section on kernel debugging in the FreeBSD handbook; 
my general advice is to compile in KDB, DDB, BREAK_TO_DEBUGGER, WITNESS, 
and INVARIANTS, and see where that gets you using a serial console.  DDB 
is pretty easy to use for basic debugging -- i.e., checking thread state, 
checking lock state, generating stack traces, etc.  Depending on the bug, 
you might also need/want to use kgdb via serial or on a core dump.


Robert N M Watson
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to troubleshoot solid freeze-up?

2005-10-02 Thread Kris Kennaway
On Sun, Oct 02, 2005 at 03:59:50PM -0400, David S. Madole wrote:
> I'm looking for some tips on how to troubleshoot a possible driver 
> problem. Here is the scenario:
> 
> 1. Using a Pentium II 333Mhz mobile processor, 82443BX motherboard, and 
> Intel i82559 NIC (fxp driver).
> 
> 2. A combination of heavy disk I/O, high CPU utilization, and high 
> network traffic causes a solid machine freeze-up sometime between 10 
> minutes and 3 hours of running.
> 
> 3. Replacing the NIC with a DP83815-based card (sis driver) seems to 
> solve the problem. I have run the problem load for up to 8 hours without 
> issue on this NIC.
> 
> 4. The problem is reproducable on multiple identical machines with 
> multiple identical NICs. Also reproducable on an i82558 NIC integrated on 
> the motherboard.
> 
> How can I go about collecting useful information to troubleshoot this 
> when the machine locks solid? How can a get a core under this scenario?
> 
> Switching to another NIC permanently is not a great solution because this 
> is a semi-embedded application and I need to use the NIC on the 
> motherboard.

You can't break to DDB in the usual way, right (Ctrl+alt+esc)?  Try
turning on KDB_STOP_NMI instead.  Also try turning on WITNESS in case
you're seeing a lock order reversal.  Actually you forgot to mention
what version of FreeBSD you're running, those suggestions only apply
to 5.x and above.

Kris


pgpDdrmJrUf6t.pgp
Description: PGP signature


How to troubleshoot solid freeze-up?

2005-10-02 Thread David S. Madole
I'm looking for some tips on how to troubleshoot a possible driver 
problem. Here is the scenario:


1. Using a Pentium II 333Mhz mobile processor, 82443BX motherboard, and 
Intel i82559 NIC (fxp driver).


2. A combination of heavy disk I/O, high CPU utilization, and high 
network traffic causes a solid machine freeze-up sometime between 10 
minutes and 3 hours of running.


3. Replacing the NIC with a DP83815-based card (sis driver) seems to 
solve the problem. I have run the problem load for up to 8 hours without 
issue on this NIC.


4. The problem is reproducable on multiple identical machines with 
multiple identical NICs. Also reproducable on an i82558 NIC integrated on 
the motherboard.


How can I go about collecting useful information to troubleshoot this 
when the machine locks solid? How can a get a core under this scenario?


Switching to another NIC permanently is not a great solution because this 
is a semi-embedded application and I need to use the NIC on the 
motherboard.


Thanks,
David Madole

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"