> So when the driver sees uncorrected errors, I'm also seeing them in my > memory scanning program - so they correspond nicely. I didn't see anything > logged in /var/log/mcelog, but I will update to the latest when possible.
I wonder if there are some BIOS options to enable reporting via CMCI/MCE? On the E5 systems the reference BIOS uses phrases like "poison forwarding" in the option names. The above behavior sounds less than useful. Scenario: Your mission critical app is running (controlling a giant laser cutter). Oops there is a memory error, and the bad data arrives at the application causing it to swing the laser beam through 180 degrees, destroying half of your lab. A few seconds/minutes later - your EDAC driver prints a message saying that the uncorrected error count just got incremented. -Tony