On Fri, 18 Jan 2013, kpn...@pobox.com wrote:

On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
I tend to agree, a machine that starts rebooting spontaneously when
nothing significant changed and it used to be stable is usually a sign
of a failing power supply or memory.

Agreed.

But I disagree about memtest86.  It's probably not completely without
value, but to me its value is only negative:  if it tells you memory is
bad, it is.  If it tells you it's good, you know nothing.  Over the
years I've had 5 dimms fail.  memtest86 found the error in one of them,
but said all the others were fine in continuous 48-hour tests.  I even
tried running the tests on multiple systems.

The thing that always reliably finds bad memory for me
is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
or more hours of runtime, but it will find your bad memory.

I've had "good" luck with gcc showing bad memory. If compiling a new kernel
produces seg faults then I know I have a hardware problem. I've seen
compilers at work failing due to bad memory as well.

Some problems only happen with particular access patterns.  So if a compiler
works fine then, like memtest86, it doesn't say anything about the health
of the hardware.

Most test tools are like that. They might diagnose something as bad, but they often can't prove it is good. SMART has a reputation for not finding any problems on disks that are failing, and capacitors that aren't swollen or leaking still may not be working.

But diagnostic tools can at least give a hint. In my case, memtest indicated a problem--a big problem. I removed one DIMM at random (there were only two) and the problems and memtest errors both went away. Replace the DIMM, and both came back.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to