On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: > > Quoting Roland Dreier <[EMAIL PROTECTED]>: > > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > > > >Feb 27 17:47:52 sw169 kernel: [<ffffffff8053aaf1>] > > >_spin_lock_irqsave+0x15/0x24 > > >Feb 27 17:47:52 sw169 kernel: [<ffffffff88067a23>] > > >:ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > > > It looks like this is deadlocking trying to take priv->lock in > > ipoib_neigh_destructor(). > > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > > turned on, and then rerun this test. There's a good chance that this would > > diagnose the deadlock. (I don't have good access to my test machines right > > now, or > > else I would do it myself) > > OK, I did that. But I get > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > and I am not sure what triggers this, or how to fix it to have the > validator actually do its job.
It usually indicates a spinlock is not properly initialized. Like __SPIN_LOCK_UNLOCKED() used in a non-static context, use spin_lock_init() in these cases. However looking at the code, ipoib_neight_destructor only uses &priv->lock, and that seems to get properly initialized in ipoib_setup() using spin_lock_init(). So either there are other sites that instanciate those objects and forget about the lock init, or the object is corrupted (use after free?) > Ingo, what key does the message refer to? > > The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c > line > 829. > > Full message below: > > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > [13440.778008] [<c023c082>] __lock_acquire+0xae4/0xbb9 > [13440.783078] [<c023c43d>] lock_acquire+0x56/0x71 > [13440.787784] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.794412] [<c051ad41>] _spin_lock_irqsave+0x32/0x41 > [13440.799649] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.806275] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.812897] [<c04a1c1b>] dst_run_gc+0xc/0x118 > [13440.817439] [<c022af6e>] run_timer_softirq+0x37/0x16b > [13440.822673] [<c04a1c0f>] dst_run_gc+0x0/0x118 > [13440.827221] [<c04a3eab>] neigh_destroy+0xbe/0x104 > [13440.832114] [<c04a1bb1>] dst_destroy+0x4d/0xab > [13440.836751] [<c04a1c64>] dst_run_gc+0x55/0x118 > [13440.841384] [<c022b03f>] run_timer_softirq+0x108/0x16b > [13440.846711] [<c0227634>] __do_softirq+0x5a/0xd5 > [13440.851427] [<c023b435>] trace_hardirqs_on+0x106/0x141 > [13440.856754] [<c0227643>] __do_softirq+0x69/0xd5 > [13440.861470] [<c02276e6>] do_softirq+0x37/0x4d > [13440.866016] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77 > [13440.871774] [<c02029ef>] default_idle+0x3b/0x54 > [13440.876491] [<c02029ef>] default_idle+0x3b/0x54 > [13440.881211] [<c0204c33>] apic_timer_interrupt+0x33/0x38 > [13440.886624] [<c02029ef>] default_idle+0x3b/0x54 > [13440.891342] [<c02029f1>] default_idle+0x3d/0x54 > [13440.896061] [<c0202aaa>] cpu_idle+0xa2/0xbb > [13440.900436] ======================= > [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 > [13768.717353] [<c031f919>] _raw_spin_lock+0xda/0xfd > [13768.722247] [<c051ad48>] _spin_lock_irqsave+0x39/0x41 > [13768.727486] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.734110] [<f899bff2>] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.740735] [<c04a1c1b>] dst_run_gc+0xc/0x118 > [13768.745276] [<c022af6e>] run_timer_softirq+0x37/0x16b > [13768.750517] [<c04a1c0f>] dst_run_gc+0x0/0x118 > [13768.755061] [<c04a3eab>] neigh_destroy+0xbe/0x104 > [13768.759955] [<c04a1bb1>] dst_destroy+0x4d/0xab > [13768.764586] [<c04a1c64>] dst_run_gc+0x55/0x118 > [13768.769218] [<c022b03f>] run_timer_softirq+0x108/0x16b > [13768.774542] [<c0227634>] __do_softirq+0x5a/0xd5 > [13768.779261] [<c023b435>] trace_hardirqs_on+0x106/0x141 > [13768.784588] [<c0227643>] __do_softirq+0x69/0xd5 > [13768.789308] [<c02276e6>] do_softirq+0x37/0x4d > [13768.793851] [<c02167b0>] smp_apic_timer_interrupt+0x6b/0x77 > [13768.799609] [<c02029ef>] default_idle+0x3b/0x54 > [13768.804326] [<c02029ef>] default_idle+0x3b/0x54 > [13768.809054] [<c0204c33>] apic_timer_interrupt+0x33/0x38 > [13768.814471] [<c02029ef>] default_idle+0x3b/0x54 > [13768.819187] [<c02029f1>] default_idle+0x3d/0x54 > [13768.823903] [<c0202aaa>] cpu_idle+0xa2/0xbb > [13768.828279] ======================= > > > -- > MST > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/