> On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: >> And I was waiting in line to get a chance to do some injection on our >> EINJ box here too. But it seems you have the required setup already so >> if you want to give those changes a run, I've uploaded them here: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras >> >> It'll be much appreciated. > > and the answer is <drum roll> .... > > > > no. :-(
I see a different panic with this kernel. Not seen every time. It was after reboot due to injected errors. [ 0.234672] mce: CPU supports 22 MCE banks [ 0.239291] CPU0: Thermal monitoring enabled (TM1) [ 0.244680] process: using mwait in idle threads [ 0.249844] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024 [ 0.256654] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4 [ 0.264330] Freeing SMP alternatives memory: 20K (ffffffff81d1e000 - ffffffff81d23000) [ 0.274057] ftrace: allocating 22650 entries in 89 pages [ 0.289946] x2apic: IRQ remapping doesn't support X2APIC mode [ 0.296505] Switched APIC routing to physical flat. [ 0.302838] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.349289] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 03) [ 0.359844] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver. [ 0.371173] ... version: 3 [ 0.375649] ... bit width: 48 [ 0.380222] ... generic registers: 4 [ 0.384698] ... value mask: 0000ffffffffffff [ 0.390632] ... max period: 0000ffffffffffff [ 0.396566] ... fixed-purpose events: 3 [ 0.401043] ... event mask: 000000070000000f [ 0.410260] x86: Booting SMP configuration: [ 0.414933] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 [ 0.706763] .... node #1, CPUs: #18 [ 0.822565] mce: [Hardware Error]: Machine check events logged [ 0.822801] #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 [ 1.078660] mce: [Hardware Error]: Machine check events logged [ 1.093416] #34 [ 1.095433] BUG: unable to handle kernel [ 1.100045] #35 [ 1.102193] NULL pointer dereference at 0000000000000008 [ 1.108126] IP: [<ffffffff8107ed01>] pool_mayday_timeout+0x81/0x150 [ 1.111969] [ 1.116818] .... node #0, CPUs: #36 [ 1.121101] PGD 0 [ 1.123348] Oops: 0000 [#1] SMP [ 1.126975] Modules linked in: [ 1.130402] CPU: 33 PID: 0 Comm: swapper/33 Not tainted 4.1.0-rc3-7-default+ #1 [ 1.138570] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 [ 1.150134] task: ffff88046e86e0d0 ti: ffff88046e874000 task.ti: ffff88046e874000 [ 1.158496] RIP: 0010:[<ffffffff8107ed01>] [<ffffffff8107ed01>] pool_mayday_timeout+0x81/0x150 [ 1.168228] RSP: 0000:ffff88087f5e3e08 EFLAGS: 00010046 [ 1.174164] RAX: 0000000fffffffe0 RBX: 0000000000000000 RCX: 0000000000000000 [ 1.182135] RDX: ffff88087f5f4898 RSI: ffffffff8107ec80 RDI: ffffffff81dd332c [ 1.190108] RBP: ffff88087f5e3e48 R08: 0000000000000000 R09: ffff88087f5ed8c0 [ 1.198080] R10: 0000000000000004 R11: 0000000000000005 R12: ffffffff81d4d880 [ 1.206052] R13: 0000000000000101 R14: ffffffff8107ec80 R15: ffff88087f5f4880 [ 1.214026] FS: 0000000000000000(0000) GS:ffff88087f5e0000(0000) knlGS:0000000000000000 [ 1.223066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.229486] #37 [ 1.229486] CR2: 0000000000000008 CR3: 0000000001a0e000 CR4: 00000000001406e0 [ 1.239605] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.247578] #38 [ 1.247578] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.257697] #39 [ 1.257697] Stack: [ 1.262090] ffff88087f5e3e48 ffffffff810bf45b 0000000000000021 ffff88087f5ed8c0 [ 1.270398] ffff88087f5f4910[ 1.273400] #40 0000000000000101 ffffffff8107ec80 ffff88087f5f4880 [ 1.280867] ffff88087f5e3e88 ffffffff810cf559 ffff88087f5e3e88 ffff88087f5ed8c0 [ 1.289177] Call Trace: [ 1.291910] #41 [ 1.294068] <IRQ> [ 1.294068] [<ffffffff810bf45b>] ? console_unlock+0x1fb/0x460 [ 1.302927] [<ffffffff8107ec80>] ? wq_unbind_fn+0x130/0x130 [ 1.309242] #42 [ 1.309242] [<ffffffff810cf559>] call_timer_fn+0x39/0x130 [ 1.317509] [<ffffffff8107ec80>] ? wq_unbind_fn+0x130/0x130 [ 1.323833] #43 [ 1.323834] [<ffffffff810d1041>] run_timer_softirq+0x211/0x300 [ 1.332598] [<ffffffff8106a874>] __do_softirq+0xe4/0x290 [ 1.338629] [<ffffffff8106ac8d>] irq_exit+0x9d/0xb0 [ 1.344177] #44 [ 1.344177] [<ffffffff8103daba>] smp_apic_timer_interrupt+0x4a/0x60 [ 1.353424] [<ffffffff815b53fe>] apic_timer_interrupt+0x6e/0x80 [ 1.360135] #45 [ 1.362292] <EOI> [ 1.362292] [<ffffffff8100d7ad>] ? mwait_idle+0x6d/0x90 [ 1.370568] [<ffffffff8100e0cf>] arch_cpu_idle+0xf/0x20 [ 1.376507] #46 [ 1.376507] [<ffffffff810aafe4>] cpu_startup_entry+0x2f4/0x3c0 [ 1.385274] [<ffffffff8103b7e3>] start_secondary+0x143/0x170 [ 1.391694] #47 [ 1.391694] Code: 49 83 ec 08 31 c9 eb 14 66 90 49 8b 44 24 08 48 39 c2 4c 8d 60 f8 0f 84 8e 00 00 00 49 8b 04 [ 1.404957] #48 24 48 89 c3 30 db a8 04 48 0f 44 d9 <4c> 8b 6b 08 49 83 bd 90 00 00 00 00 74 d1 4c 8d b3 80 00 00 00 [ 1.417801] RIP [<ffffffff8107ed01>] pool_mayday_timeout+0x81/0x150 [ 1.424914] #49 [ 1.424914] RSP <ffff88087f5e3e08> [ 1.430955] CR2: 0000000000000008 [ 1.434665] ---[ end trace 4b134008a4be60b6 ]--- [ 1.439823] #50 [ 1.439824] Kernel panic - not syncing: Fatal exception in interrupt [ 1.449088] ---[ end Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/