Hey,

I'm getting tons of EDAC error messages from the kernel lately on a
i5000 server with 16GB RAM (4 x 4GB modules). The server runs since
about three years.

The system is Debian Lenny with 2.6.26 kernel, selfcompiled from
linux-source-2.6.26 2.6.26-26lenny3.

It's not the first time, that these EDAC error messages appear.
Actually, in the last three years, I got these errors every now and
then. Sometimes only few errors where logged, sometimes my logs were
spammed with the errors for several days, but then it stopped again.

Now, the messages keep spamming my log and console for more than three
weeks already. A some days I get more than 36000 errors a day.

It's noteable, that every DRAM-Bank from 0 to 7 is affected.

Now I wonder, whether these are false positives (searching for the
errors in the web revealed that these are quite common), or whether my
RAM might be damaged.

Unfortunately, running memtest86+ is not an option, as the server in
question is a production server, and I don't have a second server for
redundancy.

Additionally, a slightly related question: How do I turn off the logging
of these messages to console? It's impossible to work in a SSH session
when the console is spammed with these logs. Neither setting
kernel.printk, nor 'setterm -msg 0', 'dmesg -n1' or 'echo 1 >
/proc/sysrq-trigger' do stop the logging flood to console. Did I miss
anything, or is it simply impossible to stop console logging for this
kind of kernel error messages. That would be very unfortunate.

I already considered to recompile the kernel without EDAC i5000 driver
in order to stop this annoyance, but I would prefer to fix the reason
instead of fighting the symptoms.

Here's an example error message:

Aug 16 13:08:20 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!!
1st FATAL Err Reg= 0x4
Aug 16 13:08:20 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with
intelligent throttling disabled
Aug 16 13:08:20 nibbler kernel: EDAC MC0: UE row 1, channel-a= 0
channel-b= 1 labels "-": (Branch=0 DRAM-Bank=6 RDWR=Read RAS=14214 CAS=0
FATAL Err=0x4)
Aug 16 13:08:22 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!!
1st FATAL Err Reg= 0x4
Aug 16 13:08:22 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with
intelligent throttling disabled
Aug 16 13:08:22 nibbler kernel: EDAC MC0: UE row 0, channel-a= 0
channel-b= 1 labels "-": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=20 CAS=0
FATAL Err=0x4)
Aug 16 13:08:24 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!!
1st FATAL Err Reg= 0x4
Aug 16 13:08:24 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with
intelligent throttling disabled
Aug 16 13:08:24 nibbler kernel: EDAC MC0: UE row 1, channel-a= 0
channel-b= 1 labels "-": (Branch=0 DRAM-Bank=1 RDWR=Read RAS=3268 CAS=0
FATAL Err=0x4)

This is what the EDAC module logged at my last reboot:

Jun 27 00:10:29 nibbler kernel: EDAC MC: Ver: 2.1.0 Jun 23 2011
Jun 27 00:10:29 nibbler kernel: EDAC MC0: Giving out device to
'i5000_edac.c' 'I5000': DEV 0000:00:10.0
Jun 27 00:10:29 nibbler kernel: EDAC PCI0: Giving out device to module
'i5000_edac' controller 'EDAC PCI controller': DEV '0000:00:10.0' (POLLED)

And last but not least the output of 'dmidecode -t memory':

# dmidecode 2.9
SMBIOS 2.5 present.

Handle 0x0038, DMI type 16, 15 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Multi-bit ECC
        Maximum Capacity: 32 GB
        Error Information Handle: Not Provided
        Number Of Devices: 8

Handle 0x003A, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 4096 MB
        Form Factor: FB-DIMM
        Set: 1
        Locator: ONBOARD DIMM_A1
        Bank Locator: Channel A
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: 667 MHz (1.5 ns)
        Manufacturer: 8551
        Serial Number: 02028121
        Asset Tag: Not Specified
        Part Number: 72T512920EFA3SC

Handle 0x003C, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: FB-DIMM
        Set: 2
        Locator: ONBOARD DIMM_A2
        Bank Locator: Channel A
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: MemUndefined
        Serial Number: MemUndefined
        Asset Tag: Not Specified
        Part Number: MemUndefined

Handle 0x003D, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 4096 MB
        Form Factor: FB-DIMM
        Set: 1
        Locator: ONBOARD DIMM_B1
        Bank Locator: Channel B
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: 667 MHz (1.5 ns)
        Manufacturer: 8551
        Serial Number: 02027215
        Asset Tag: Not Specified
        Part Number: 72T512920EFA3SC

Handle 0x003F, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: FB-DIMM
        Set: 2
        Locator: ONBOARD DIMM_B2
        Bank Locator: Channel B
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: MemUndefined
        Serial Number: MemUndefined
        Asset Tag: Not Specified
        Part Number: MemUndefined

Handle 0x0040, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 4096 MB
        Form Factor: FB-DIMM
        Set: 5
        Locator: ONBOARD DIMM_C1
        Bank Locator: Channel C
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: 667 MHz (1.5 ns)
        Manufacturer: 8551
        Serial Number: 02027112
        Asset Tag: Not Specified
        Part Number: 72T512920EFA3SC

Handle 0x0042, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: FB-DIMM
        Set: 6
        Locator: ONBOARD DIMM_C2
        Bank Locator: Channel C
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: MemUndefined
        Serial Number: MemUndefined
        Asset Tag: Not Specified
        Part Number: MemUndefined

Handle 0x0043, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 4096 MB
        Form Factor: FB-DIMM
        Set: 5
        Locator: ONBOARD DIMM_D1
        Bank Locator: Channel D
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: 667 MHz (1.5 ns)
        Manufacturer: 8551
        Serial Number: 02028522
        Asset Tag: Not Specified
        Part Number: 72T512920EFA3SC

Handle 0x0045, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0038
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: FB-DIMM
        Set: 6
        Locator: ONBOARD DIMM_D2
        Bank Locator: Channel D
        Type: DDR2 FB-DIMM
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: MemUndefined
        Serial Number: MemUndefined
        Asset Tag: Not Specified
        Part Number: MemUndefined

Greetings,
 jonas


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e4a567a.6030...@freesources.org

Reply via email to