On Fri, Nov 08, 2013 at 03:06:11PM -0500, Vince Weaver wrote: > > and again, this time after 600k successful syscalls or so. > > This is on a core2 machine. > > [ 1020.396002] ------------[ cut here ]------------ > [ 1020.396002] WARNING: CPU: 1 PID: 3036 at kernel/watchdog.c:245 > watchdog_over) > [ 1020.396002] Watchdog detected hard LOCKUP on cpu 1 > [ 1020.396002] Modules linked in: cpufreq_userspace cpufreq_stats > cpufreq_powern > [ 1020.396002] CPU: 1 PID: 3036 Comm: perf_fuzzer Not tainted 3.12.0 #2 > [ 1020.396002] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, > BI2 > [ 1020.396002] 00000000000000f5 ffff88011fc87c48 ffffffff8151aad6 > ffff88011fc88 > [ 1020.396002] ffffffff8103cd68 0000000100000000 ffffffff810a101c > ffff88011b380 > [ 1020.396002] ffff88011fc87ef8 ffff88011fc87d58 0000000000000000 > ffff88011fc88 > [ 1020.396002] Call Trace: > [ 1020.396002] <NMI> [<ffffffff8151aad6>] ? dump_stack+0x41/0x53 > [ 1020.396002] [<ffffffff8103cd68>] ? warn_slowpath_common+0x81/0x9c > [ 1020.396002] [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6 > [ 1020.396002] [<ffffffff8103ce25>] ? warn_slowpath_fmt+0x46/0x48 > [ 1020.396002] [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6 > [ 1020.396002] [<ffffffff810ca4a2>] ? __perf_event_overflow+0x12e/0x1ab > [ 1020.396002] [<ffffffff81018cf5>] ? intel_pmu_handle_irq+0x318/0x34d > [ 1020.396002] [<ffffffff81018cfc>] ? intel_pmu_handle_irq+0x31f/0x34d > [ 1020.396002] [<ffffffff81018c95>] ? intel_pmu_handle_irq+0x2b8/0x34d > [ 1020.396002] [<ffffffff81018671>] ? intel_pmu_enable_all+0x55/0xa7 > [ 1020.396002] [<ffffffff8151edfc>] ? nmi_handle+0xa8/0x13a > [ 1020.396002] [<ffffffff8151f473>] ? perf_event_nmi_handler+0x24/0x3d > [ 1020.396002] [<ffffffff8151edb2>] ? nmi_handle+0x5e/0x13a > [ 1020.396002] [<ffffffff8151ef35>] ? do_nmi+0xa7/0x2bd > [ 1020.396002] [<ffffffff8151e537>] ? end_repeat_nmi+0x1e/0x2e > [ 1020.396002] [<ffffffff8102bcc8>] ? apic_send_IPI_self+0x33/0x35 > [ 1020.396002] [<ffffffff8102bcc8>] ? apic_send_IPI_self+0x33/0x35 > [ 1020.396002] [<ffffffff8102bcc8>] ? apic_send_IPI_self+0x33/0x35 > [ 1020.396002] <<EOE>> <IRQ> [<ffffffff81005ee2>] ? > arch_irq_work_raise+0x27a > [ 1020.396002] [<ffffffff810c4ada>] ? irq_work_queue+0x70/0x72 > [ 1020.396002] [<ffffffff810ca510>] ? __perf_event_overflow+0x19c/0x1ab > [ 1020.396002] [<ffffffff8104ae16>] ? __send_signal+0x244/0x26f > [ 1020.396002] [<ffffffff810ca68b>] ? perf_swevent_overflow+0x3f/0x58 > [ 1020.396002] [<ffffffff810ca8a5>] ? perf_tp_event+0xe9/0x1f6 > [ 1020.396002] [<ffffffff81005f6b>] ? smp_trace_irq_work_interrupt+0x34/0x7e > [ 1020.396002] [<ffffffff81040954>] ? perf_trace_softirq+0x8a/0x96 > [ 1020.396002] [<ffffffff8100366f>] ? perf_trace_x86_irq_vector+0xc0/0xc9 > [ 1020.396002] [<ffffffff81005f6b>] ? smp_trace_irq_work_interrupt+0x34/0x7e > [ 1020.396002] [<ffffffff81005f6b>] ? smp_trace_irq_work_interrupt+0x34/0x7e > [ 1020.396002] [<ffffffff815261ca>] ? trace_irq_work_interrupt+0x6a/0x70 > [ 1020.396002] [<ffffffff81040fba>] ? __do_softirq+0x95/0x208 > [ 1020.396002] [<ffffffff815263dc>] ? call_softirq+0x1c/0x30 > [ 1020.396002] [<ffffffff810041b9>] ? do_softirq+0x38/0x6d > [ 1020.396002] [<ffffffff81040d1c>] ? irq_exit+0x45/0xa1 > [ 1020.396002] [<ffffffff81526a14>] ? > smp_trace_apic_timer_interrupt+0x83/0x91 > [ 1020.396002] [<ffffffff8152597a>] ? trace_apic_timer_interrupt+0x6a/0x70 > [ 1020.396002] <EOI> [<ffffffff810c7c20>] ? __perf_event_enable+0x118/0x173 > [ 1020.396002] [<ffffffff8104c694>] ? get_signal_to_deliver+0x47d/0x48a > [ 1020.396002] [<ffffffff810024da>] ? do_signal+0x46/0x5f5 > [ 1020.396002] [<ffffffff8100b9f5>] ? perf_trace_sys_exit+0x3e/0xa7 > [ 1020.396002] [<ffffffff81002ab5>] ? do_notify_resume+0x2c/0x6a > [ 1020.396002] [<ffffffff81524ea0>] ? int_signal+0x12/0x17 > [ 1020.396002] ---[ end trace 2b385d729ba32e92 ]--- > [ 1020.396002] perf samples too long (8101340 > 10000), lowering > kernel.perf_ev0 > [ 1020.396002] INFO: NMI handler (perf_event_nmi_handler) took too long to > run:s > [ 1033.301811] perf samples too long (8038058 > 20000), lowering > kernel.perf_ev0 > [ 1034.125811] perf samples too long (7975268 > 40000), lowering > kernel.perf_ev0 > [ 1034.973813] perf samples too long (7912968 > 76923), lowering > kernel.perf_ev0 > [ 1035.821813] perf samples too long (7851154 > 142857), lowering > kernel.perf_e0 > [ 1036.669813] perf samples too long (7789824 > 250000), lowering > kernel.perf_e0 > [ 1037.517812] perf samples too long (7728973 > 500000), lowering > kernel.perf_e0 >
There seem to be a loop that takes too long in intel_pmu_handle_irq(). Your two previous reports seemed to suggest that lbr is involved, but not this one. I'm going to look further into this, unfortunately my testbox is an AMD... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/