A few weeks ago, the front LED panel on my PE2900-III turned amber. When 
I logged into it via DRAC5, the log showed several entries saying the 
DIMM4 had a lot of errors that exceeded some threshold. I assume this 
meant DIMM4 might be going bad but ECC was handling the errors so far. I 
moved DIMM4 to DIMM1 just to confirm the problem followed the FB-DIMM 
and not the slot or motherboard. Sure enough, after a few days the blue 
light turned amber and this time DIMM1 was reported as having too many 
errors. As a quick remedy, I had on hand another FB-DIMM of the same RAM 
size (8GB), but different brand and different ranks (the original was 
and Elpida 4Rx4 8GB FB-DIMM, the spare I had on hand is Crucial 2Rx4 8GB 
FB-DIMM), so I installed the Crucial FB-DIMM. The server booted up fine, 
and I haven't seen an amber light for almost two weeks.

This week, I'm setting up monitoring on this server, and noticed that 
OMSA is still reporting DIMM1 in CRITICAL state:

# omreport chassis memory
Memory Information

Health : Critical

Memory Redundancy

Fail Over State          : Inactive
Redundancy Configuration : Disabled

Attributes of Memory Array(s)

Attributes of Memory Array(s)
Location           : System Board or Motherboard
Use                : System Memory
Installed Capacity : 65536  MB
Maximum Capacity   : 65280  MB
Slots Available    : 12
Slots Used         : 8
Error Correction   : Multibit ECC

Total of Memory Array(s)
Total Installed Capacity                     : 65536  MB
Total Installed Capacity Available to the OS : 63255  MB
Total Maximum Capacity                       : 65280  MB

Details of Memory Array 1
Index          : 0
Status         : Critical
Connector Name : DIMM1
Type           : DDR2 FB-DIMM - Synchronous
Size           : 8192  MB

Is there a state I have to reset? Or, is it reporting this because the 
FB-DIMM is of different rank and/or brand? (same speed) The server has 
been running fine, no more amber lights, but this is concerning and 
certainly will setup alarms when the monitoring goes into effect. Just 
wondering if there is something I need to do to "clear" the "CRITICAL" 
state, or if I simply need to get an FB-DIMM of the same 4Rx4 rank and 
brand?

Thanks for any suggestions,
Bond

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Reply via email to