Re: ECC status in FreeBSD
On Dec 20, 2004, at 3:55 PM, Brett Glass wrote: I'm getting ready to build some (hopefully) high reliability servers with ECC memory. I'd like to put FreeBSD on them. What facilities (if any) does FreeBSD have for: 1) Reporting the status of ECC memory (errors corrected, errors uncorrected, etc.)? 2) Responding to uncorrectable errors? A quick check of the archives suggests a FreeBSD version of a kernel module which pays attention to the ECC status of various chipsets is available from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=113348+0+archive/2001/ freebsd-hackers/20010318.freebsd-hackers ...based on the work for Linux at: http://www.anime.net/~goemon/linux-ecc/ 3) Mapping out portions of memory that produce repeated errors? You can set an option in the loader to limit the physical memory available to FreeBSD, which could serve the purpose. However, your RAM isn't a hard drive, so the ad-sector remapping used by hard drives is not fully applicable. Your machine is expected not to have any part of memory fail reproducably, but if you do, it's time to use the warranty and replace the entire chip. It seems to me that, for an operating system that prides itself on server stability and performance, such features are a must. ECC is a fine idea, but the motherboard chipset pretty much does everything that is required (except for the reporting/syslogging), so the kernel doesn't need to be specially involved for the system to benefit from ECC protection. -- -Chuck ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ECC status in FreeBSD
At 03:25 PM 12/20/2004, Charles Swiger wrote: However, your RAM isn't a hard drive, so the ad-sector remapping used by hard drives is not fully applicable. Your machine is expected not to have any part of memory fail reproducably, but if you do, it's time to use the warranty and replace the entire chip. It's true that RAM is not a hard drive. However, if the problem is with certain memory cells rather than, say, the row or column drivers, the rest of the chip is usable. And if you did want to scuttle the entire module on which the chip resided, you'd probably want to disable that module in the meantime by telling the system not to use it. Certainly, you'd at least want to know which module was failing. There's nothing to tell you that right now. ECC is a fine idea, but the motherboard chipset pretty much does everything that is required (except for the reporting/syslogging), so the kernel doesn't need to be specially involved for the system to benefit from ECC protection. Alas, right now there's no way to KNOW that you need to deal with a failing RAM module until you start experiencing random and possibly destructive system panics or crashes. It'd be nice, at least, to see something in the logs or be able to collect statistics from the motherboard. --Brett ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]