On Mon, Dec 07, 2015 at 10:07:59PM +0000, Luck, Tony wrote: > > And that is incorrect too, because the MCE (at least the one I'm > > injecting) gets broadcasted to the CPUs on the *node* and not to the > > whole system. > > Which system? What kind of machine check? On Intel we expect machine checks > to be broadcast to all logical cpus on all nodes (unless local machine check > is enabled, > in which case SRAR style machine checks go only to the logical cpu that hit > the error). > > The code is written to that expectation ... and we don't report things as > well if > something else happens (like too many or too few cpus showing up).
Box logs below. BIOS is doing funny cores enumeration: node #0, CPUs 0-7 node #1, CPUs 8-15 node #2, CPUs 16-23 node #3, CPUs 24-31 and then starts from node 0 again: .... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 .... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 .... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 So I went and offlined cores 5 and 34 which are on node 0. Why node 0? Well, when I inject error type 0x10 which is 0x00000010 Memory Uncorrectable non-fatal it generates an MCE only on the node 0 cores. For that log see the end of this mail. The gist of it is that the CPUs on which #MC gets raised are the cores on node 0, i.e., 0-7 and 32-39. Cores 5 and 34 are gone, of course. I mean, even if the #MC gets raised only on the node, the fix still works. $ grep -Ei "hardware.*CPU" /tmp/mce | sed 's/^.*CPU//' | sort -n 0: Machine Check Exception: 5 Bank 5: be00000000010090 1: Machine Check Exception: 5 Bank 5: be00000000010090 2: Machine Check Exception: 5 Bank 5: be00000000010090 3: Machine Check Exception: 5 Bank 5: be00000000010090 4: Machine Check Exception: 5 Bank 5: be00000000010090 6: Machine Check Exception: 5 Bank 5: be00000000010090 7: Machine Check Exception: 5 Bank 5: be00000000010090 32: Machine Check Exception: 5 Bank 5: be00000000010090 33: Machine Check Exception: 5 Bank 5: be00000000010090 35: Machine Check Exception: 5 Bank 5: be00000000010090 36: Machine Check Exception: 5 Bank 5: be00000000010090 37: Machine Check Exception: 5 Bank 5: be00000000010090 38: Machine Check Exception: 5 Bank 5: be00000000010090 39: Machine Check Exception: 5 Bank 5: be00000000010090 [ 0.859060] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz (family: 0x6, model: 0x2d, stepping: 0x7 ... [ 0.981593] x86: Booting SMP configuration: [ 0.991092] .... node #0, CPUs: #1 [ 1.013485] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.034219] #2 [ 1.049577] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.070309] #3 [ 1.085865] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.106618] #4 [ 1.121978] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.142720] #5 [ 1.158079] microcode: CPU5 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.178833] #6 [ 1.194191] microcode: CPU6 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.214914] #7 [ 1.230471] microcode: CPU7 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.251309] [ 1.254854] .... node #1, CPUs: #8 [ 1.275173] microcode: CPU8 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.390509] #9 [ 1.406859] microcode: CPU9 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.427735] #10 [ 1.444303] microcode: CPU10 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.465343] #11 [ 1.481718] microcode: CPU11 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.502779] #12 [ 1.519156] microcode: CPU12 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.540171] #13 [ 1.556536] microcode: CPU13 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.577587] #14 [ 1.594127] microcode: CPU14 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.615131] #15 [ 1.631471] microcode: CPU15 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.652590] [ 1.656132] .... node #2, CPUs: #16 [ 1.676518] microcode: CPU16 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.791812] #17 [ 1.808189] microcode: CPU17 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.829292] #18 [ 1.845868] microcode: CPU18 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.866925] #19 [ 1.883311] microcode: CPU19 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.904386] #20 [ 1.920765] microcode: CPU20 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.941810] #21 [ 1.958169] microcode: CPU21 microcode updated early to revision 0x710, date = 2013-06-17 [ 1.979242] #22 [ 1.995787] microcode: CPU22 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.016842] #23 [ 2.033182] microcode: CPU23 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.054314] [ 2.057854] .... node #3, CPUs: #24 [ 2.078330] microcode: CPU24 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.193513] #25 [ 2.209874] microcode: CPU25 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.230996] #26 [ 2.247563] microcode: CPU26 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.268627] #27 [ 2.284998] microcode: CPU27 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.306061] #28 [ 2.322437] microcode: CPU28 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.343433] #29 [ 2.359780] microcode: CPU29 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.380855] #30 [ 2.397397] microcode: CPU30 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.418432] #31 [ 2.434759] microcode: CPU31 microcode updated early to revision 0x710, date = 2013-06-17 [ 2.455792] [ 2.459336] .... node #0, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.583817] .... node #1, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 2.710873] .... node #2, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 2.838069] .... node #3, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 2.964288] x86: Booted up 4 nodes, 64 CPUs [ 2.974471] smpboot: Total of 64 processors activated (344907.86 BogoMIPS) [ 5290.635126] Broke affinity for irq 82 [ 5290.643222] Broke affinity for irq 111 [ 5290.651507] Broke affinity for irq 125 [ 5290.664107] smpboot: CPU 5 is now offline [ 5298.371336] Broke affinity for irq 31 [ 5298.379528] Broke affinity for irq 82 [ 5298.387627] Broke affinity for irq 103 [ 5298.395908] Broke affinity for irq 110 [ 5298.404187] Broke affinity for irq 111 [ 5298.412450] Broke affinity for irq 112 [ 5298.420733] Broke affinity for irq 118 [ 5298.429017] Broke affinity for irq 124 [ 5298.437295] Broke affinity for irq 125 [ 5298.445584] Broke affinity for irq 127 [ 5298.453880] Broke affinity for irq 137 [ 5298.466543] smpboot: CPU 34 is now offline [ 5302.187338] EINJ: Error INJection is initialized. [ 5318.897170] Disabling lock debugging due to kernel taint [ 5318.910775] mce: [Hardware Error]: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5318.931171] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5318.951567] mce: [Hardware Error]: TSC bab9f2d8a4e00 ADDR bb68ec00 MISC 20403ebe86 [ 5318.969835] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC b microcode 710 [ 5318.990959] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5319.003825] EDAC sbridge MC0: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.023215] EDAC sbridge MC0: TSC bab9f2d8a4e00 [ 5319.033036] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5319.050338] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC b [ 5319.069542] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset :0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5319.122943] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.143355] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5319.163846] mce: [Hardware Error]: TSC bab9f2d8a51c1 ADDR bb68ec00 MISC 20403ebe86 [ 5319.182249] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 6 microcode 710 [ 5319.203539] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5319.216586] EDAC sbridge MC0: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.235994] EDAC sbridge MC0: TSC bab9f2d8a51c1 [ 5319.245814] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5319.263348] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 6 [ 5319.283041] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset :0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5319.337311] mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.357960] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8159a4d0> {mutex_lock+0x10/0x27} [ 5319.378519] mce: [Hardware Error]: TSC bab9f2d8a3feb ADDR bb68ec00 MISC 20403ebe86 [ 5319.397151] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 4 microcode 710 [ 5319.418650] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5319.431902] EDAC sbridge MC0: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.451491] EDAC sbridge MC0: TSC bab9f2d8a3feb [ 5319.461311] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5319.479022] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 4 [ 5319.499014] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset :0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5319.553209] mce: [Hardware Error]: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.574029] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5319.594953] mce: [Hardware Error]: TSC bab9f2d8a87ea ADDR bb68ec00 MISC 20403ebe86 [ 5319.613756] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC c microcode 710 [ 5319.635431] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5319.648873] EDAC sbridge MC0: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.668661] EDAC sbridge MC0: TSC bab9f2d8a87ea [ 5319.678483] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5319.696422] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC c [ 5319.716789] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset :0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5319.771531] mce: [Hardware Error]: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.792743] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5319.813836] mce: [Hardware Error]: TSC bab9f2d8a87ce ADDR bb68ec00 MISC 20403ebe86 [ 5319.832819] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC d microcode 710 [ 5319.854654] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5319.868243] EDAC sbridge MC0: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5319.888366] EDAC sbridge MC0: TSC bab9f2d8a87ce [ 5319.898186] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5319.916192] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC d [ 5319.936752] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5319.991752] mce: [Hardware Error]: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.013034] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5320.034166] mce: [Hardware Error]: TSC bab9f2d8a59dd ADDR bb68ec00 MISC 20403ebe86 [ 5320.053149] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 7 microcode 710 [ 5320.074972] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5320.088567] EDAC sbridge MC0: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.108688] EDAC sbridge MC0: TSC bab9f2d8a59dd [ 5320.118511] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5320.136527] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 7 [ 5320.157079] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5320.212025] mce: [Hardware Error]: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.233316] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5320.254462] mce: [Hardware Error]: TSC bab9f2d8a4f5c ADDR bb68ec00 MISC 20403ebe86 [ 5320.273455] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC f microcode 710 [ 5320.295303] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5320.308905] EDAC sbridge MC0: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.329026] EDAC sbridge MC0: TSC bab9f2d8a4f5c [ 5320.338847] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5320.356858] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC f [ 5320.377433] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5320.432474] mce: [Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.453569] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5320.474703] mce: [Hardware Error]: TSC bab9f2d8a4d60 ADDR bb68ec00 MISC 20403ebe86 [ 5320.493689] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC e microcode 710 [ 5320.515532] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5320.529139] EDAC sbridge MC0: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.549050] EDAC sbridge MC0: TSC bab9f2d8a4d60 [ 5320.558870] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5320.576890] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC e [ 5320.597478] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5320.652525] mce: [Hardware Error]: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.673804] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5320.694918] mce: [Hardware Error]: TSC bab9f2d8a5823 ADDR bb68ec00 MISC 20403ebe86 [ 5320.713916] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 9 microcode 710 [ 5320.735759] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5320.749347] EDAC sbridge MC0: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.769452] EDAC sbridge MC0: TSC bab9f2d8a5823 [ 5320.779273] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5320.797296] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 9 [ 5320.817877] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5320.872972] mce: [Hardware Error]: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.894249] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5320.915390] mce: [Hardware Error]: TSC bab9f2d8a5326 ADDR bb68ec00 MISC 20403ebe86 [ 5320.934374] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 3 microcode 710 [ 5320.956222] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5320.969807] EDAC sbridge MC0: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5320.989913] EDAC sbridge MC0: TSC bab9f2d8a5326 [ 5320.999734] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5321.017750] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 3 [ 5321.038284] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5321.093686] mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.114770] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5321.135925] mce: [Hardware Error]: TSC bab9f2d8a5562 ADDR bb68ec00 MISC 20403ebe86 [ 5321.154918] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 2 microcode 710 [ 5321.176765] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5321.190369] EDAC sbridge MC0: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.210303] EDAC sbridge MC0: TSC bab9f2d8a5562 [ 5321.220123] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5321.238146] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 2 [ 5321.258723] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5321.303358] mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.324279] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5321.345397] mce: [Hardware Error]: TSC bab9f2d8a572f ADDR bb68ec00 MISC 20403ebe86 [ 5321.364380] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 8 microcode 710 [ 5321.386184] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5321.399729] EDAC sbridge MC0: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.419624] EDAC sbridge MC0: TSC bab9f2d8a572f [ 5321.429445] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5321.447454] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 8 [ 5321.467989] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5321.511475] mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.532587] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5321.553689] mce: [Hardware Error]: TSC bab9f2d8a50f4 ADDR bb68ec00 MISC 20403ebe86 [ 5321.572681] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 1 microcode 710 [ 5321.594500] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5321.608057] EDAC sbridge MC0: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.628161] EDAC sbridge MC0: TSC bab9f2d8a50f4 [ 5321.637982] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5321.655998] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 1 [ 5321.676524] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5321.720020] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.740939] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130} [ 5321.762058] mce: [Hardware Error]: TSC bab9f2d8a5034 ADDR bb68ec00 MISC 20403ebe86 [ 5321.781022] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 0 microcode 710 [ 5321.802837] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [ 5321.816395] EDAC sbridge MC0: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090 [ 5321.836300] EDAC sbridge MC0: TSC bab9f2d8a5034 [ 5321.846121] EDAC sbridge MC0: ADDR bb68ec00 EDAC sbridge MC0: MISC 20403ebe86 [ 5321.864127] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1449517966 SOCKET 0 APIC 0 [ 5321.884647] EDAC MC0: 0 UE memory read error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0xbb68e offset:0xc00 grain:32 - area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0) [ 5321.928136] mce: [Hardware Error]: Machine check: Processor context corrupt [ 5321.945589] Kernel panic - not syncing: Fatal machine check [ 5321.985122] Kernel Offset: disabled [ 5322.008492] Rebooting in 100 seconds.. [ 5421.226077] ACPI MEMORY or I/O RESET_REG. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/