Hi Alan,
Sorry about the late reply.. been wanting to write this but I was quite
delayed...
From the System's handbook, it appears the SunFire X2250 server is using
ECC memory. More precisely, it's using 'Registered ECC DDR2-800/PC2-6400 DIMM'.
In the case of ECC ram, memtest86 will most likely not be able to detect
any minor faults since this is Error-Correcting-Memory. You have to ask
the hardware to spit out the error correction statistics and see if
something's happening.
On RHEL5, this is typically done by installing edac-utils and displaying
the counters. Example om a Nehalem workstation:
# lsmod |grep -i edac
i7core_edac 46921 0
edac_mc 61217 1 i7core_edac
# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: CPU#0Channel#0_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow0: CPU#0Channel#0_DIMM#0: 0 Corrected Errors
mc0: csrow1: CPU#0Channel#0_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow1: CPU#0Channel#0_DIMM#1: 0 Corrected Errors
mc0: csrow2: CPU#0Channel#1_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow2: CPU#0Channel#1_DIMM#0: 0 Corrected Errors
mc0: csrow3: CPU#0Channel#1_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow3: CPU#0Channel#1_DIMM#1: 0 Corrected Errors
mc0: csrow4: CPU#0Channel#2_DIMM#0|ch1: 0 Uncorrected Errors
mc0: csrow4: CPU#0Channel#2_DIMM#0: 0 Corrected Errors
mc0: csrow5: CPU#0Channel#2_DIMM#1|ch1: 0 Uncorrected Errors
mc0: csrow5: CPU#0Channel#2_DIMM#1: 0 Corrected Errors
In the case of defective memory, you will most likely notice that some of
the counters aren't zero, which usually helps in identifying the culprit
(if you hardware doesn't provide integrated diagnostics).
My 2c,
Vincent
On Wed, 14 Mar 2012, Alan McKay wrote:
Well I did exactly what I'd done 3 months ago and found a faulty RAM chip this
time
My guess is that back then the chip was still functioning some of the time, and
happened to be fine just when I was doing the tests.
This time I found it fairly easily with a systematic approach.
_______________________________________________
rhelv5-list mailing list
rhelv5-list@redhat.com
https://www.redhat.com/mailman/listinfo/rhelv5-list