After running memtest86 (V3.3) for at least 24 hours, I came back and saw that each machine completed 61-63 cycles of tests, with 0 errors...
However, I did look through the BIOS for cache disabling - and it doesn't appear I can disable the CPU cache. I did turn on chipkill and some other supposed ECC memory "helpers" and instantly had the machine crash twice. [EMAIL PROTECTED] ~]# mcelog --k8 --ascii <mce2.txt CPU 0 4 northbridge TSC 2 Northbridge Chipkill ECC error Chipkill ECC syndrome = 6ca0 bit32 = err cpu0 bit45 = uncorrected ecc error bit57 = processor context corrupt bit61 = error uncorrected bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS b65020016c080813 MCGSTATUS 4 332ff8453 ADDR 7ff5faf0 Kernel panic - not syncing: Machine check [EMAIL PROTECTED] ~]# mcelog --k8 --ascii <mce.txt CPU 0 4 northbridge TSC 34096547a5 RIP 10:ffffffff8010c275 Northbridge Chipkill ECC error Chipkill ECC syndrome = 6ca0 bit40 = error found by scrub bit45 = uncorrected ecc error bit61 = error uncorrected bit62 = error overflow (multiple errors) bus error 'local node response, request didn't time out generic read mem transaction memory access, level generic' STATUS f45021006c080a13 MCGSTATUS 7 RIP: default_idle+0x22/0x25} Kernel panic - not syncing: Uncorrected machine check I tried running the same thing that I make it crash with (just a simple make -j2 on kernel sources) with only 1 DIMM at a time, to see if I could figure out if either one was to blame; neither failed after a few minutes. Now I've put them both back (but in the opposite slot) and so far it's been running. But that is the nature of this issue - it can happen after 10 minutes or 10 hours... and I can't have that!