On Tue, Jul 24, 2012 at 07:10:49PM +0200, Sasha Levin wrote: > Hi all, > > I was fuzzing with trinity inside a KVM tools guest, on the current 3.6, and > stumbled on the following: > > (Note that it also happens on -next). > > [ 215.034674] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 215.035641] 1: (0 ticks this GP) idle=3f5/140000000000001/0 drain=0 > . timer=18446744073709551615
OK, so this says that the CPU is awake, so let's look at CPU 1's stack. It also says that the CPU has not taken any scheduling-globk interrupts for the past 100K+ jiffies, indicating that the CPU has interrupts disabled. > [ 215.035641] (detected by 3, t=111886 jiffies) > [ 215.035641] sending NMI to all CPUs: > [ 214.959504] NMI backtrace for cpu 2 > [ 214.959504] CPU 2 > [ 214.959504] Pid: 6583, comm: trinity-child6 Tainted: G W > 3.5.0-sasha-01644-g824681b #267 > [ 214.959504] RIP: 0010:[<ffffffff81149aa3>] [<ffffffff81149aa3>] > check_flags+0x93/0x1d0 > [ 214.959504] RSP: 0018:ffff880025175b58 EFLAGS: 00000046 > [ 214.959504] RAX: ffff880027910000 RBX: ffff880027910000 RCX: > 0000000000000000 > [ 214.959504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000001 > [ 214.959504] RBP: ffff880025175b58 R08: 0000000000000000 R09: > 0000000000000000 > [ 214.959504] R10: 0000000000000001 R11: 0000000000000001 R12: > 0000000000000000 > [ 214.959504] R13: 0000000000000002 R14: 0000000000000000 R15: > 0000000000000000 > [ 214.959504] FS: 00007f754d198700(0000) GS:ffff880029800000(0000) > knlGS:0000000000000000 > [ 214.959504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 214.959504] CR2: 0000000000000008 CR3: 00000000251aa000 CR4: > 00000000000406e0 > [ 214.959504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 214.959504] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 214.959504] Process trinity-child6 (pid: 6583, threadinfo > ffff880025174000, task ffff880027910000) > [ 214.959504] Stack: > [ 214.959504] ffff880025175be8 ffffffff8114e610 ffff880027910000 > 0000000000000000 > [ 214.959504] 0000000000000286 ffffffffffffff10 ffffffff8114e51d > 0000000000000010 > [ 214.959504] 0000000000000286 0000000025175bb8 ffff880019a767c8 > 0000000000000286 > [ 214.959504] Call Trace: > [ 214.959504] [<ffffffff8114e610>] lock_acquire+0x80/0x270 > [ 214.959504] [<ffffffff8114e51d>] ? lock_release+0x1ad/0x220 > [ 214.959504] [<ffffffff83699dcb>] _raw_spin_lock+0x3b/0x70 > [ 214.959504] [<ffffffff8124a055>] ? __d_lookup+0x145/0x2e0 > [ 214.959504] [<ffffffff8124a055>] __d_lookup+0x145/0x2e0 > [ 214.959504] [<ffffffff8123d216>] lookup_fast+0x136/0x240 > [ 214.959504] [<ffffffff8123be8b>] ? __inode_permission+0xbb/0xd0 > [ 214.959504] [<ffffffff8123bea9>] ? inode_permission+0x9/0x50 > [ 214.959504] [<ffffffff8123da1c>] link_path_walk+0x22c/0x8f0 > [ 214.959504] [<ffffffff8123cc43>] ? path_init+0x213/0x550 > [ 214.959504] [<ffffffff81976e38>] ? __raw_spin_lock_init+0x38/0x70 > [ 214.959504] [<ffffffff8123faa8>] path_openat+0x98/0x400 > [ 214.959504] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 214.959504] [<ffffffff8123fe4d>] do_filp_open+0x3d/0xa0 > [ 214.959504] [<ffffffff819770c8>] ? do_raw_spin_unlock+0xc8/0xe0 > [ 214.959504] [<ffffffff8124f703>] ? alloc_fd+0x1d3/0x1f0 > [ 214.959504] [<ffffffff8124f531>] ? alloc_fd+0x1/0x1f0 > [ 214.959504] [<ffffffff8122f125>] do_sys_open+0x115/0x1b0 > [ 214.959504] [<ffffffff8122f1dc>] sys_open+0x1c/0x20 > [ 214.959504] [<ffffffff8369b77d>] system_call_fastpath+0x1a/0x1f > [ 214.959504] Code: a4 26 84 e8 70 e2 f8 ff e9 25 01 00 00 0f 1f 00 44 8b 05 > 61 1c 1e 04 45 85 c0 75 4c 65 48 8b 04 25 00 da 00 00 8b b8 b0 08 00 00 <85> > ff 75 39 e8 64 b4 81 00 85 c0 0f 84 09 01 00 00 83 3d 45 bf And this is CPU 1: > [ 215.026612] NMI backtrace for cpu 1 > [ 215.026612] CPU 1 > [ 215.026612] Pid: 2395, comm: pageattr-test Tainted: G W > 3.5.0-sasha-01644-g824681b #267 > [ 215.026612] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] > native_write_msr_safe+0xa/0x10 > [ 215.026612] RSP: 0018:ffff88001b803f70 EFLAGS: 00000046 The interrupt flag is zero, so interrupts are disabled. So my question to you is "Why did do_pageattr_test() or one of the functions it called disable interrupts for more than one hundred thousand jiffies?" I can't see where it is disabling interrupts, but the EFLAGS register above does indicate that they are disabled. Or is this something useless that is just telling us that receiving the NMI disabled interrupts? Thanx, Paul > [ 215.026612] RAX: 0000000000000000 RBX: 000000000000ef80 RCX: > 000000000000080b > [ 215.026612] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 000000000000080b > [ 215.026612] RBP: ffff88001b803f70 R08: 00003ffffffff000 R09: > ffff880000000000 > [ 215.026612] R10: 0000000000000001 R11: 0000000000000000 R12: > 0000000000000000 > [ 215.026612] R13: ffff880007a98000 R14: ffff88001fc3a4b8 R15: > ffff880000000000 > [ 215.026612] FS: 0000000000000000(0000) GS:ffff88001b800000(0000) > knlGS:0000000000000000 > [ 215.026612] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 215.026612] CR2: 000000000000000c CR3: 0000000019dcf000 CR4: > 00000000000406e0 > [ 215.026612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 215.026612] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 215.026612] Process pageattr-test (pid: 2395, threadinfo ffff88000ced0000, > task ffff88000d773000) > [ 215.026612] Stack: > [ 215.026612] ffff88001b803f88 ffffffff81095d07 000000000000ef80 > ffff88001b803fa8 > [ 215.026612] ffffffff810901c4 ffff88000ced1db0 0000000000007a98 > ffff88000ced1d00 > [ 215.026612] ffffffff8369c4af ffff88000ced1c78 <EOI> ffff88000ced1d00 > ffffffff8369aeb4 > [ 215.026612] Call Trace: > [ 215.026612] <IRQ> > [ 215.026612] [<ffffffff81095d07>] native_apic_msr_eoi_write+0x27/0x30 > [ 215.026612] [<ffffffff810901c4>] smp_apic_timer_interrupt+0x34/0xa0 > [ 215.026612] [<ffffffff8369c4af>] apic_timer_interrupt+0x6f/0x80 > [ 215.026612] <EOI> > [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 215.026612] [<ffffffff810a2cb2>] ? lookup_address+0x112/0x170 > [ 215.026612] [<ffffffff810a32ae>] print_split+0x7e/0x200 > [ 215.026612] [<ffffffff810a4300>] pageattr_test+0x80/0x4a0 > [ 215.026612] [<ffffffff8114a2f8>] ? trace_hardirqs_on_caller+0x118/0x140 > [ 215.026612] [<ffffffff83696cbe>] ? schedule_timeout+0x2ce/0x370 > [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 215.026612] [<ffffffff810a4720>] ? pageattr_test+0x4a0/0x4a0 > [ 215.026612] [<ffffffff810a473f>] do_pageattr_test+0x1f/0x50 > [ 215.026612] [<ffffffff8110607d>] kthread+0xad/0xc0 > [ 215.026612] [<ffffffff8369cdb4>] kernel_thread_helper+0x4/0x10 > [ 215.026612] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 215.026612] [<ffffffff81105fd0>] ? kthread_flush_work_fn+0x10/0x10 > [ 215.026612] [<ffffffff8369cdb0>] ? gs_change+0x13/0x13 > [ 215.026612] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 > 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> > c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48 > [ 215.035641] NMI backtrace for cpu 3 > [ 215.035641] CPU 3 > [ 215.035641] Pid: 0, comm: swapper/3 Tainted: G W > 3.5.0-sasha-01644-g824681b #267 > [ 215.035641] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] > native_write_msr_safe+0xa/0x10 > [ 215.035641] RSP: 0018:ffff88000d5b5ce8 EFLAGS: 00000086 > [ 215.035641] RAX: 0000000000000400 RBX: 0000000000000003 RCX: > 0000000000000830 > [ 215.035641] RDX: 0000000000000003 RSI: 0000000000000400 RDI: > 0000000000000830 > [ 215.035641] RBP: ffff88000d5b5ce8 R08: ffffffff84edc9c0 R09: > 0000000000000000 > [ 215.035641] R10: 000000000000739d R11: 0000000000025370 R12: > ffffffff84edc9c0 > [ 215.035641] R13: 0000000000000003 R14: 000000000000f042 R15: > 0000000000000002 > [ 215.035641] FS: 0000000000000000(0000) GS:ffff880035a00000(0000) > knlGS:0000000000000000 > [ 215.035641] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 215.035641] CR2: 0000000000bc5958 CR3: 00000000278b0000 CR4: > 00000000000406e0 > [ 215.035641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 215.035641] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 215.035641] Process swapper/3 (pid: 0, threadinfo ffff88000d5b4000, task > ffff88000d5b8000) > [ 215.035641] Stack: > [ 215.035641] ffff88000d5b5d48 ffffffff81095e0d ffff88000d5b5d58 > 0000000000000086 > [ 215.035641] 0000000000000002 0008000000000008 ffff88000d5b5d68 > 0000000000002710 > [ 215.035641] 000000000001b50e 0000000000000001 ffffffff84966008 > 0000000000000086 > [ 215.035641] Call Trace: > [ 215.035641] [<ffffffff81095e0d>] __x2apic_send_IPI_mask+0xcd/0x190 > [ 215.035641] [<ffffffff81095ee7>] x2apic_send_IPI_all+0x17/0x20 > [ 215.035641] [<ffffffff810918c3>] arch_trigger_all_cpu_backtrace+0x63/0xa0 > [ 215.035641] [<ffffffff81194513>] print_other_cpu_stall+0x223/0x290 > [ 215.035641] [<ffffffff81194600>] __rcu_pending+0x80/0x200 > [ 215.035641] [<ffffffff8110e1d8>] ? __atomic_notifier_call_chain+0xe8/0x120 > [ 215.035641] [<ffffffff811947bf>] rcu_pending+0x3f/0x70 > [ 215.035641] [<ffffffff811949c4>] rcu_prepare_for_idle+0x1d4/0x430 > [ 215.035641] [<ffffffff81194e6c>] rcu_idle_enter_common+0x24c/0x400 > [ 215.035641] [<ffffffff8110e1f5>] ? > __atomic_notifier_call_chain+0x105/0x120 > [ 215.035641] [<ffffffff8110e0f0>] ? > atomic_notifier_chain_unregister+0x90/0x90 > [ 215.035641] [<ffffffff8119530b>] rcu_idle_enter+0xdb/0x130 > [ 215.035641] [<ffffffff81074636>] cpu_idle+0x106/0x160 > [ 215.035641] [<ffffffff835d1bb0>] start_secondary+0xd0/0xd7 > [ 215.035641] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 > 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> > c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48 > [ 214.997077] NMI backtrace for cpu 0 > [ 214.997077] CPU 0 > [ 214.997077] Pid: 2277, comm: kworker/0:2 Tainted: G W > 3.5.0-sasha-01644-g824681b #267 > [ 214.997077] RIP: 0010:[<ffffffff8109bd5a>] [<ffffffff8109bd5a>] > native_write_msr_safe+0xa/0x10 > [ 214.997077] RSP: 0018:ffff88000d803f70 EFLAGS: 00000046 > [ 214.997077] RAX: 0000000000000000 RBX: 000000000000ef80 RCX: > 000000000000080b > [ 214.997077] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 000000000000080b > [ 214.997077] RBP: ffff88000d803f70 R08: 0000000000000000 R09: > 0000000000000000 > [ 214.997077] R10: 0000000000000001 R11: 0000000000000001 R12: > 0000000000000000 > [ 214.997077] R13: ffffffff84854840 R14: 0000000000000286 R15: > 00000000000493e0 > [ 214.997077] FS: 0000000000000000(0000) GS:ffff88000d800000(0000) > knlGS:0000000000000000 > [ 214.997077] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 214.997077] CR2: 0000000000000000 CR3: 0000000012f58000 CR4: > 00000000000406f0 > [ 214.997077] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 214.997077] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 214.997077] Process kworker/0:2 (pid: 2277, threadinfo ffff88000d782000, > task ffff88000d653000) > [ 214.997077] Stack: > [ 214.997077] ffff88000d803f88 ffffffff81095d07 000000000000ef80 > ffff88000d803fa8 > [ 214.997077] ffffffff810901c4 0000000000000002 ffff88000d653000 > ffff88000d783cf0 > [ 214.997077] ffffffff8369c4af ffff88000d783c48 <EOI> ffff88000d783cf0 > ffffffff8369aeb4 > [ 214.997077] Call Trace: > [ 214.997077] <IRQ> > [ 214.997077] [<ffffffff81095d07>] native_apic_msr_eoi_write+0x27/0x30 > [ 214.997077] [<ffffffff810901c4>] smp_apic_timer_interrupt+0x34/0xa0 > [ 214.997077] [<ffffffff8369c4af>] apic_timer_interrupt+0x6f/0x80 > [ 214.997077] <EOI> > [ 214.997077] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 214.997077] [<ffffffff8114b46d>] ? lock_is_held+0xed/0x110 > [ 214.997077] [<ffffffff8313ed87>] rt_check_expire+0x117/0x300 > [ 214.997077] [<ffffffff8313ef79>] rt_worker_func+0x9/0x20 > [ 214.997077] [<ffffffff810fa936>] process_one_work+0x376/0x700 > [ 214.997077] [<ffffffff810fa7f0>] ? process_one_work+0x230/0x700 > [ 214.997077] [<ffffffff8313ef70>] ? rt_check_expire+0x300/0x300 > [ 214.997077] [<ffffffff810fb508>] ? worker_thread+0x48/0x320 > [ 214.997077] [<ffffffff810fb69c>] worker_thread+0x1dc/0x320 > [ 214.997077] [<ffffffff810fb4c0>] ? manage_workers.isra.11+0x120/0x120 > [ 214.997077] [<ffffffff8110607d>] kthread+0xad/0xc0 > [ 214.997077] [<ffffffff8369cdb4>] kernel_thread_helper+0x4/0x10 > [ 214.997077] [<ffffffff8369aeb4>] ? retint_restore_args+0x13/0x13 > [ 214.997077] [<ffffffff81105fd0>] ? kthread_flush_work_fn+0x10/0x10 > [ 214.997077] [<ffffffff8369cdb0>] ? gs_change+0x13/0x13 > [ 214.997077] Code: 00 55 89 f9 48 89 e5 0f 32 45 31 c0 89 c7 48 89 d0 44 89 > 06 48 c1 e0 20 89 f9 5d 48 09 c8 c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> > c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 5d 48 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/