On Tue, Jun 27, 2006 at 01:38:35AM +0200, M.Hirsch wrote: > I just would like you (not specifically you, Dmitry) to aknowledge that > broken RAM is worth a "panic" in "standard situations"- if I may call it > like that.
Well, ideally, if broken ram could be isolated with something like IBM's chipkill stuff, then that would be better than panicing. Sort of like enabling hot-swap of failing disk drives. The point that's been made, though, is that "soft" errors aren't necessarily (or even) hardware failures at all. Hardware failures can look like persistent soft errors, but soft errors are real: radiation induced bit-flippage happens. ECC turns what would otherwise be a panic-inducing error state into a total non-event, improving the uptime of very large memory systems to useful levels. Exactly similar to the forward error correction used on disk drives and communications channels. In all of these systems, the technology has been pushed so close to the limits that the difference between "signal" and "noise" can only be determined by sophisticated statistical analysis and systematic redundancy. > If the RAM is broken for some bits, chances are great that there are > more following soon. > ... from the replies I got via PM, I feel some people don't agree with > that.... A single corrected error just isn't an indication that the hardware is broken. If the ECC scrubber can't flip the bit to the right state, *then* the hardware is broken, and you do need to panic. -- Andrew _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"