We have found that linpack is by far the better memory tester than Memtest86+. Memtest does not find all the bad RAM that linpack triggers, visible through the mcelog and through IPMI BMC logs. The nice thing about the BMC log entries is that it actually tells you which DIMM in which CPU-bank was causing the ECC so you don't need to trouble shoot with a lengthy divide and conquer approach.
Michael -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Mathog Sent: Tuesday, November 27, 2007 9:54 AM To: Tony Travis Cc: [email protected] Subject: Re: [Beowulf] Not quite Walmart, or, living without ECC? Tony Travis wrote: > Memtest86+ is fine for 'burn-in' tests, but it does not do a realistic > memory stress test under the conditions that normal applications run. Wow, deja vu. I just remembered we had almost exactly this same discussion 2 years ago, in fact I apparently sent you my hacked up version of memtester which has delays in it between the write and read cycles, to allow it to catch bit fade (due to radiation or whatever). One thing I still don't get though, if memtester is catching memory errors which only appear when _other parts of the system are active_ does replacing the "bad" memory actually cure these problems? That is, if memtest86+ runs cleanly and memtester finds problems, is it really the memory which is the issue? Regards, David Mathog [EMAIL PROTECTED] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
