Rob Kampen wrote:
Hi List,
I've been getting the following EDAC memory errors
EDAC MC0: CE page 0xeb0dd, offset 0x0, grain 4096, syndrome 0x45, row
3, channel 0, label "": i82875p CE
and from this seeing that these errors have been corrected.
Checking cat /sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count gives
me a count of 4
thus I now know that csrow3 - ch0 is the problem
My question is, how does this map to the on board labels
DIMM 1A
DIMM 1B
DIMM 2A
DIMM 2B
Am I correct in assuming csrow 3 is DIMM 2B?
Swapped the memory between DIMM 2A and DIMM 2B - still get fault in row
3, channel 0 - thus did not move with the RAM??
Next reboot I'll try swapping 1A and 1B
Also I have just discovered that both the OS drives sda and sdb have
huge number of errors shown on the SMART records
- can this relate to the memory errors??
- I am just really surprised to have two drives show almost identical
number of errors at the same time, yet no apparent data errors -
Drives are ATA ST380013AS 74.53 GB
Just for safety I swapped /dev/sda with a new slightly larger drive did
the sfdisk foo and added it to the md raid drives.
This brand new drive immediately shows high raw read error rate and
hardware ECC recovered in the tens of millions - I think this is not a
drive issue but related to the ECC mem errors??
Anyone with experience?
TIA for your insightful comments
------------------------------------------------------------------------
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos