hi, we are getting hard lockups on Core2 cpus (model 23) just by running 'perf test'
PID: 10425 TASK: ffff880068562e00 CPU: 3 COMMAND: "perf" #0 [ffff88007d985a08] machine_kexec at ffffffff8105521b #1 [ffff88007d985a68] crash_kexec at ffffffff810f7412 #2 [ffff88007d985b38] panic at ffffffff8163c031 #3 [ffff88007d985bb8] watchdog_overflow_callback at ffffffff81120472 #4 [ffff88007d985bc8] __perf_event_overflow at ffffffff81164e0e #5 [ffff88007d985c00] perf_event_overflow at ffffffff81165a44 #6 [ffff88007d985c10] intel_pmu_handle_irq at ffffffff81033198 #7 [ffff88007d985e60] perf_event_nmi_handler at ffffffff8164be8b #8 [ffff88007d985e80] nmi_handle at ffffffff8164b5d9 #9 [ffff88007d985ec8] do_nmi at ffffffff8164b789 #10 [ffff88007d985ef0] end_repeat_nmi at ffffffff8164aa13 [exception RIP: intel_pmu_enable_all+17] RIP: ffffffff81032301 RSP: ffff88005e917c98 RFLAGS: 00000046 RAX: ffff88007d98cd20 RBX: ffff88005e991000 RCX: 000000000000038f RDX: 0000000000000007 RSI: 0000000000000003 RDI: 0000000000000000 RBP: ffff88005e917cd8 R8: ffffffffffffff85 R9: 000000ffffffffff R10: ffff88007d98c100 R11: ffff88005e9179e0 R12: ffff88007d98bd10 R13: ffff88007d98b9e0 R14: ffff88007d98bc08 R15: 0000000000000002 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #11 [ffff88005e917c98] intel_pmu_enable_all at ffffffff81032301 #12 [ffff88005e917c98] x86_pmu_enable at ffffffff8102ba24 #13 [ffff88005e917ce0] perf_pmu_enable at ffffffff81160457 #14 [ffff88005e917cf0] perf_event_context_sched_in at ffffffff81161930 #15 [ffff88005e917d20] perf_event_exec at ffffffff811621db #16 [ffff88005e917d68] setup_new_exec at ffffffff811edffd #17 [ffff88005e917d88] load_elf_binary at ffffffff81240ed9 #18 [ffff88005e917e58] search_binary_handler at ffffffff811ec89d #19 [ffff88005e917ea0] do_execve_common at ffffffff811ede04 #20 [ffff88005e917f30] sys_execve at ffffffff811ee199 #21 [ffff88005e917f50] stub_execve at ffffffff816531a9 the reproducer seems to be hw event with very small period like (thanks Arnaldo ;-): perf record -e cycles -c 123 kill I bisected it down to the: 156174999dd1 perf/intel/x86: Enlarge the PEBS buffer Looks like the bigger PEBS buffer together with event being marked as PERF_X86_EVENT_FREERUNNING will block the CPU right after the event is enabled before it could reach local_irq_enable and trigger the NMI watchdog. I can't find what's special about Core2 CPU PEBS setup, it seems that oher CPUs are ok (tried on ivb/snb/hsw). reverting the 156174999dd1 fixed the issue for me ideas? thanks, jirka