Continued instability with 5.3-STABLE

2005-03-09 Thread Tony Arcieri
I have a dual Opteron upon which seems to only stay up approximately two
weeks at a time then spontaneously reboots.  It's colocated so I can't ever
see panic messages, and I don't have another system colocated at the same
place I can use to gather debugging info.

I've never managed to get the system to generate a crash dump either.  It
has a 1GB swap partition and 2GB of physical RAM but through the last
few reboots I've been setting hw.physmem to 896M as the only custom parameter
in loader.conf.  The swap partition is labeled as follows:

twed0s1b  swap 1024MB SWAP

And dumpdev is set in rc.conf as follows:

dumpdev=/dev/twed0s1b

/var/crash/minfree is set to 2048

Lately I built a kernel from GENERIC using the latest RELENG_5 sources and
without SMP support and experienced a reboot after approximately 16 days uptime,
roughly equivalent to how long it took the system to crash with SMP enabled.
No core file was generated.

The kernel was built using source checked out from RELENG_5 on February 18th.
I'm not sure if any Opteron specific fixes have been applied to the branch
since then.

Are there any other means of gathering debugging data that would work in
my situation?  As is I'm still unsure if my problems are hardware or software
related as I've still never seen a panic message from the system (hardware is 
a Tyan K8S motherboard in a Tyan Transport system)

Should I look into using KTR ALQ to log KTR data to the swap partition, and
if it fills up will it wrap over to the beginning?  I've never used that
feature before...

Tony Arcieri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Continued instability with 5.3-STABLE

2005-03-09 Thread Doug White
On Wed, 9 Mar 2005, Tony Arcieri wrote:

 I have a dual Opteron upon which seems to only stay up approximately two
 weeks at a time then spontaneously reboots.  It's colocated so I can't ever
 see panic messages, and I don't have another system colocated at the same
 place I can use to gather debugging info.

You may want to consider finding a small system with a free serial port to
serve as a temporary serial console.  Without output from the crash its
impossible to tell what went wrong.

 I've never managed to get the system to generate a crash dump either.  It
 has a 1GB swap partition and 2GB of physical RAM but through the last
 few reboots I've been setting hw.physmem to 896M as the only custom parameter
 in loader.conf.  The swap partition is labeled as follows:

 twed0s1b  swap 1024MB SWAP

 And dumpdev is set in rc.conf as follows:

 dumpdev=/dev/twed0s1b

 /var/crash/minfree is set to 2048

 Lately I built a kernel from GENERIC using the latest RELENG_5 sources and
 without SMP support and experienced a reboot after approximately 16 days 
 uptime,
 roughly equivalent to how long it took the system to crash with SMP enabled.
 No core file was generated.

 The kernel was built using source checked out from RELENG_5 on February 18th.
 I'm not sure if any Opteron specific fixes have been applied to the branch
 since then.

Make sure you're actually running this kernel since crashdump support for
twe was added 2/12, in rev 1.22.2.1 of src/sys/dev/twe/twe.c.

 Are there any other means of gathering debugging data that would work in
 my situation?  As is I'm still unsure if my problems are hardware or
 software related as I've still never seen a panic message from the
 system (hardware is a Tyan K8S motherboard in a Tyan Transport system)

You really, really want a serial console.

 Should I look into using KTR ALQ to log KTR data to the swap partition, and
 if it fills up will it wrap over to the beginning?  I've never used that
 feature before...

If you don't have a serial console to manipulate ddb from or crashdumps
then there is no way to retrieve the ktr data.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]