Re: [CentOS] how to debug hardware lockups?

Les Mikesell Tue, 18 Nov 2008 05:03:26 -0800

Rudi Ahlers wrote:

On Sun, Nov 16, 2008 at 1:14 AM, John R Pierce <[EMAIL PROTECTED]> wrote:

Rudi Ahlers wrote:

Well, on a standard CentOS 5.2, /var/log/messages will be the the
place to log problems like this, or where else can I get more info?

tough to write to the disk when the kernel is crashing.  ditto the network.
  that leaves VGAs and serial ports, which can be written to by self
contained emergency-crash routines...

IIRC, you said this was a Q9something quad core... thats a desktop
processor... does this server have ECC memory?  (I ask, because few desktop
platforms do, while ECC is fairly standard on servers).    Without ECC, the
system has no way of knowing it read in bad data from the ram, and if the
bad data happens to be code and that code happens to be in the kernel,
ka-RASH, without any detection or warning, it leaps off into never-land, and
you get a kernel fault, almost always resulting in...

  kernel panic
  system halted

with no additional useful information available.     with ECC memory, single
bit errors get corrected on the fly, and log an ECC error event, while
double bit errors result in a system halt with a message indicating such.



No, the motherboard doesn't support ECC RAM. The motherboard is a
Intel DG35EC - 
http://www.intel.com/products/desktop/motherboards/DG35EC/DG35EC-overview.htm

I had machine that would crash about once every week or two in normaloperation. Memtest86+ found an error in the 2nd day of running. Theworst part was that it left the raid mirrors in a strange state thatcaused occasional problems for months even after replacing the RAM.


--
  Les Mikesell
    [EMAIL PROTECTED]

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] how to debug hardware lockups?

Reply via email to