I upgraded four Dell R815s from wheezy to jessie a few weeks ago. Prior to the upgrade, they were running reliably for about 5 years. Since the upgrade, two machines have been getting periodic machine checks. The machines boot fine and run for a day or more. The machine checks appear to happen sporadically. I can't determine a correlation with anything in particular.
The front panel on the first machine says the machine check was on CPU #4. The front panel on the second machine said the first machine check was on CPU #1 and the second machine check was on CPU #2. I am suspicious that this is really a hardware problem. Three CPUs begin exhibiting machine checks within a few weeks of each other, all immediately after upgrading wheezy to jessie, after working reliably for five years. Has anybody else encountered this issue? Any suggestions on how to debug and fix? Thanks, Jeff (http://engineering.purdue.edu/~qobi) ------------------------------------------------------------------------------- root@arivu:~# ipmitool sel elist 1 | 08/05/2016 | 00:12:47 | Event Logging Disabled SEL | Log area reset/cleared | Asserted 2 | 08/06/2016 | 11:35:17 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted 3 | 08/06/2016 | 11:35:17 | Unknown #0x28 | | Asserted 4 | 08/06/2016 | 11:35:18 | Unknown #0x28 | | Asserted 5 | 08/06/2016 | 11:35:18 | Unknown #0x28 | | Asserted 6 | 08/06/2016 | 11:35:18 | Unknown #0x28 | | Asserted 7 | 08/06/2016 | 11:35:18 | Unknown #0x28 | | Asserted 8 | 08/06/2016 | 11:35:19 | Unknown #0x28 | | Asserted 9 | 08/06/2016 | 11:35:19 | Unknown #0x28 | | Asserted a | 08/06/2016 | 11:35:19 | Unknown #0x28 | | Asserted root@arivu:~# root@perisikan:~# ipmitool sel elist [...] 1c | 08/08/2016 | 12:23:02 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted 1d | 08/08/2016 | 12:23:03 | Unknown #0x28 | | Asserted 1e | 08/08/2016 | 12:23:03 | Unknown #0x28 | | Asserted 1f | 08/08/2016 | 12:23:03 | Unknown #0x28 | | Asserted 20 | 08/08/2016 | 12:23:03 | Unknown #0x28 | | Asserted 21 | 08/08/2016 | 12:23:03 | Unknown #0x28 | | Asserted 22 | 08/08/2016 | 12:23:04 | Unknown #0x28 | | Asserted 23 | 08/08/2016 | 12:23:04 | Unknown #0x28 | | Asserted 24 | 08/08/2016 | 12:23:04 | Unknown #0x28 | | Asserted 25 | 08/09/2016 | 18:37:46 | Processor CPU Machine Chk | Transition to Non-recoverable | Asserted 26 | 08/09/2016 | 18:37:46 | Unknown #0x28 | | Asserted 27 | 08/09/2016 | 18:37:47 | Unknown #0x28 | | Asserted 28 | 08/09/2016 | 18:37:47 | Unknown #0x28 | | Asserted 29 | 08/09/2016 | 18:37:47 | Unknown #0x28 | | Asserted 2a | 08/09/2016 | 18:37:47 | Unknown #0x28 | | Asserted 2b | 08/09/2016 | 18:37:48 | Unknown #0x28 | | Asserted 2c | 08/09/2016 | 18:37:48 | Unknown #0x28 | | Asserted 2d | 08/09/2016 | 18:37:48 | Unknown #0x28 | | Asserted root@perisikan:~#