Hey, I'm getting tons of EDAC error messages from the kernel lately on a i5000 server with 16GB RAM (4 x 4GB modules). The server runs since about three years.
The system is Debian Lenny with 2.6.26 kernel, selfcompiled from linux-source-2.6.26 2.6.26-26lenny3. It's not the first time, that these EDAC error messages appear. Actually, in the last three years, I got these errors every now and then. Sometimes only few errors where logged, sometimes my logs were spammed with the errors for several days, but then it stopped again. Now, the messages keep spamming my log and console for more than three weeks already. A some days I get more than 36000 errors a day. It's noteable, that every DRAM-Bank from 0 to 7 is affected. Now I wonder, whether these are false positives (searching for the errors in the web revealed that these are quite common), or whether my RAM might be damaged. Unfortunately, running memtest86+ is not an option, as the server in question is a production server, and I don't have a second server for redundancy. Additionally, a slightly related question: How do I turn off the logging of these messages to console? It's impossible to work in a SSH session when the console is spammed with these logs. Neither setting kernel.printk, nor 'setterm -msg 0', 'dmesg -n1' or 'echo 1 > /proc/sysrq-trigger' do stop the logging flood to console. Did I miss anything, or is it simply impossible to stop console logging for this kind of kernel error messages. That would be very unfortunate. I already considered to recompile the kernel without EDAC i5000 driver in order to stop this annoyance, but I would prefer to fix the reason instead of fighting the symptoms. Here's an example error message: Aug 16 13:08:20 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4 Aug 16 13:08:20 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled Aug 16 13:08:20 nibbler kernel: EDAC MC0: UE row 1, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=6 RDWR=Read RAS=14214 CAS=0 FATAL Err=0x4) Aug 16 13:08:22 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4 Aug 16 13:08:22 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled Aug 16 13:08:22 nibbler kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=20 CAS=0 FATAL Err=0x4) Aug 16 13:08:24 nibbler kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4 Aug 16 13:08:24 nibbler kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled Aug 16 13:08:24 nibbler kernel: EDAC MC0: UE row 1, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=1 RDWR=Read RAS=3268 CAS=0 FATAL Err=0x4) This is what the EDAC module logged at my last reboot: Jun 27 00:10:29 nibbler kernel: EDAC MC: Ver: 2.1.0 Jun 23 2011 Jun 27 00:10:29 nibbler kernel: EDAC MC0: Giving out device to 'i5000_edac.c' 'I5000': DEV 0000:00:10.0 Jun 27 00:10:29 nibbler kernel: EDAC PCI0: Giving out device to module 'i5000_edac' controller 'EDAC PCI controller': DEV '0000:00:10.0' (POLLED) And last but not least the output of 'dmidecode -t memory': # dmidecode 2.9 SMBIOS 2.5 present. Handle 0x0038, DMI type 16, 15 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 32 GB Error Information Handle: Not Provided Number Of Devices: 8 Handle 0x003A, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 4096 MB Form Factor: FB-DIMM Set: 1 Locator: ONBOARD DIMM_A1 Bank Locator: Channel A Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: 8551 Serial Number: 02028121 Asset Tag: Not Specified Part Number: 72T512920EFA3SC Handle 0x003C, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: Unknown Data Width: Unknown Size: No Module Installed Form Factor: FB-DIMM Set: 2 Locator: ONBOARD DIMM_A2 Bank Locator: Channel A Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: Unknown Manufacturer: MemUndefined Serial Number: MemUndefined Asset Tag: Not Specified Part Number: MemUndefined Handle 0x003D, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 4096 MB Form Factor: FB-DIMM Set: 1 Locator: ONBOARD DIMM_B1 Bank Locator: Channel B Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: 8551 Serial Number: 02027215 Asset Tag: Not Specified Part Number: 72T512920EFA3SC Handle 0x003F, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: Unknown Data Width: Unknown Size: No Module Installed Form Factor: FB-DIMM Set: 2 Locator: ONBOARD DIMM_B2 Bank Locator: Channel B Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: Unknown Manufacturer: MemUndefined Serial Number: MemUndefined Asset Tag: Not Specified Part Number: MemUndefined Handle 0x0040, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 4096 MB Form Factor: FB-DIMM Set: 5 Locator: ONBOARD DIMM_C1 Bank Locator: Channel C Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: 8551 Serial Number: 02027112 Asset Tag: Not Specified Part Number: 72T512920EFA3SC Handle 0x0042, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: Unknown Data Width: Unknown Size: No Module Installed Form Factor: FB-DIMM Set: 6 Locator: ONBOARD DIMM_C2 Bank Locator: Channel C Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: Unknown Manufacturer: MemUndefined Serial Number: MemUndefined Asset Tag: Not Specified Part Number: MemUndefined Handle 0x0043, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 4096 MB Form Factor: FB-DIMM Set: 5 Locator: ONBOARD DIMM_D1 Bank Locator: Channel D Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: 8551 Serial Number: 02028522 Asset Tag: Not Specified Part Number: 72T512920EFA3SC Handle 0x0045, DMI type 17, 27 bytes Memory Device Array Handle: 0x0038 Error Information Handle: Not Provided Total Width: Unknown Data Width: Unknown Size: No Module Installed Form Factor: FB-DIMM Set: 6 Locator: ONBOARD DIMM_D2 Bank Locator: Channel D Type: DDR2 FB-DIMM Type Detail: Synchronous Speed: Unknown Manufacturer: MemUndefined Serial Number: MemUndefined Asset Tag: Not Specified Part Number: MemUndefined Greetings, jonas -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4e4a567a.6030...@freesources.org