Hi Alan,

Sorry about the late reply.. been wanting to write this but I was quite delayed...

From the System's handbook, it appears the SunFire X2250 server is using
ECC memory. More precisely, it's using 'Registered ECC DDR2-800/PC2-6400 DIMM'.

In the case of ECC ram, memtest86 will most likely not be able to detect any minor faults since this is Error-Correcting-Memory. You have to ask the hardware to spit out the error correction statistics and see if something's happening.

On RHEL5, this is typically done by installing edac-utils and displaying the counters. Example om a Nehalem workstation:

# lsmod |grep -i edac
i7core_edac            46921  0
edac_mc                61217  1 i7core_edac

# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: CPU#0Channel#0_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow0: CPU#0Channel#0_DIMM#0: 0 Corrected Errors
mc0: csrow1: CPU#0Channel#0_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow1: CPU#0Channel#0_DIMM#1: 0 Corrected Errors
mc0: csrow2: CPU#0Channel#1_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow2: CPU#0Channel#1_DIMM#0: 0 Corrected Errors
mc0: csrow3: CPU#0Channel#1_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow3: CPU#0Channel#1_DIMM#1: 0 Corrected Errors
mc0: csrow4: CPU#0Channel#2_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow4: CPU#0Channel#2_DIMM#0: 0 Corrected Errors
mc0: csrow5: CPU#0Channel#2_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow5: CPU#0Channel#2_DIMM#1: 0 Corrected Errors

In the case of defective memory, you will most likely notice that some of the counters aren't zero, which usually helps in identifying the culprit (if you hardware doesn't provide integrated diagnostics).

My 2c,

Vincent

On Wed, 14 Mar 2012, Alan McKay wrote:

Well I did exactly what I'd done 3 months ago and found a faulty RAM chip this 
time
My guess is that back then the chip was still functioning some of the time, and 
happened to be fine just when I was doing the tests.

This time I found it fairly easily with a systematic approach.

_______________________________________________
rhelv5-list mailing list
rhelv5-list@redhat.com
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to