Hi- I’ve had CONFIG_PROVE_LOCKING set in my recent kernel builds (3.13, 3.14). With 3.14, it started spitting out the below warning at boot time.
But this week, after 3.14, my system deadlocks on boot. I bisected it today. It appears to start with either commit 6f008e72 or 462bf234. Disabling CONFIG_PROVE_LOCKING allows me to boot 3.15-rc1. [ cut here ] Apr 15 14:44:59 manet kernel: ================================= Apr 15 14:44:59 manet kernel: [ INFO: inconsistent lock state ] Apr 15 14:44:59 manet kernel: 3.14.0-rc3-00053-ge086481 #1 Tainted: GF Apr 15 14:44:59 manet kernel: --------------------------------- Apr 15 14:44:59 manet kernel: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. Apr 15 14:44:59 manet kernel: swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: Apr 15 14:44:59 manet kernel: (&(&iboe->lock)->rlock){+.?...}, at: [<ffffffffa0649ece>] mlx4_ib_addr_event+0xde/0x180 [mlx4_ib] Apr 15 14:44:59 manet kernel: {SOFTIRQ-ON-W} state was registered at: Apr 15 14:44:59 manet kernel: [<ffffffff810b7430>] mark_irqflags+0x110/0x170 Apr 15 14:44:59 manet kernel: [<ffffffff810b8afc>] __lock_acquire+0x29c/0x570 Apr 15 14:44:59 manet kernel: [<ffffffff810b8f01>] lock_acquire+0x131/0x160 Apr 15 14:44:59 manet kernel: [<ffffffff8161bb19>] _raw_spin_lock+0x39/0x50 Apr 15 14:44:59 manet kernel: [<ffffffffa064bc2c>] mlx4_ib_scan_netdevs+0x2c/0x210 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffffa064be35>] mlx4_ib_netdev_event+0x25/0x30 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffff81530839>] register_netdevice_notifier+0x99/0x1e0 Apr 15 14:44:59 manet kernel: [<ffffffffa064d21c>] mlx4_ib_add+0x76c/0xbf0 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffffa05de228>] mlx4_add_device+0x48/0xa0 [mlx4_core] Apr 15 14:44:59 manet kernel: [<ffffffffa05de383>] mlx4_register_interface+0x73/0xb0 [mlx4_core] Apr 15 14:44:59 manet kernel: [<ffffffffa05b305a>] 0xffffffffa05b305a Apr 15 14:44:59 manet kernel: [<ffffffff8100028a>] do_one_initcall+0xba/0x170 Apr 15 14:44:59 manet kernel: [<ffffffff810eaa94>] do_init_module+0x84/0x1e0 Apr 15 14:44:59 manet kernel: [<ffffffff810ee886>] load_module+0x5d6/0x750 Apr 15 14:44:59 manet kernel: [<ffffffff810eeb99>] SyS_init_module+0x99/0xd0 Apr 15 14:44:59 manet kernel: [<ffffffff81626192>] system_call_fastpath+0x16/0x1b Apr 15 14:44:59 manet kernel: irq event stamp: 237158 Apr 15 14:44:59 manet kernel: hardirqs last enabled at (237158): [<ffffffff8105f0c5>] __local_bh_enable_ip+0xb5/0xc0 Apr 15 14:44:59 manet kernel: hardirqs last disabled at (237157): [<ffffffff8105f066>] __local_bh_enable_ip+0x56/0xc0 Apr 15 14:44:59 manet kernel: softirqs last enabled at (237022): [<ffffffff8105f00a>] _local_bh_enable+0x4a/0x50 Apr 15 14:44:59 manet kernel: softirqs last disabled at (237023): [<ffffffff8105fd44>] irq_exit+0x44/0xd0 Apr 15 14:44:59 manet kernel: Apr 15 14:44:59 manet kernel: other info that might help us debug this: Apr 15 14:44:59 manet kernel: Possible unsafe locking scenario: Apr 15 14:44:59 manet kernel: Apr 15 14:44:59 manet kernel: CPU0 Apr 15 14:44:59 manet kernel: ---- Apr 15 14:44:59 manet kernel: lock(&(&iboe->lock)->rlock); Apr 15 14:44:59 manet kernel: <Interrupt> Apr 15 14:44:59 manet kernel: lock(&(&iboe->lock)->rlock); Apr 15 14:44:59 manet kernel: Apr 15 14:44:59 manet kernel: *** DEADLOCK *** Apr 15 14:44:59 manet kernel: Apr 15 14:44:59 manet kernel: 3 locks held by swapper/0/0: Apr 15 14:44:59 manet kernel: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81533f00>] __netif_receive_skb_core+0x240/0x960 Apr 15 14:44:59 manet kernel: #1: (rcu_read_lock){.+.+..}, at: [<ffffffffa037054c>] ip6_input_finish+0x7c/0x710 [ipv6] Apr 15 14:44:59 manet kernel: #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81620960>] __atomic_notifier_call_chain+0x0/0x130 Apr 15 14:44:59 manet kernel: Apr 15 14:44:59 manet kernel: stack backtrace: Apr 15 14:44:59 manet kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF 3.14.0-rc3-00053-ge086481 #1 Apr 15 14:44:59 manet kernel: Hardware name: Shuttle SZ77/FZ77, BIOS 1.10 07/10/2012 Apr 15 14:44:59 manet kernel: ffffffff81c11118 ffff88022f2035b8 ffffffff816165e3 0000000000000002 Apr 15 14:44:59 manet kernel: ffffffff81c104c0 ffff88022f203608 ffffffff810b636a 0000000000000001 Apr 15 14:44:59 manet kernel: 0000000000000001 0000000b00000000 ffffffff82281948 0000000000000006 Apr 15 14:44:59 manet kernel: Call Trace: Apr 15 14:44:59 manet kernel: <IRQ> [<ffffffff816165e3>] dump_stack+0x51/0x6e Apr 15 14:44:59 manet kernel: [<ffffffff810b636a>] print_usage_bug+0x17a/0x1a0 Apr 15 14:44:59 manet kernel: [<ffffffff810b6980>] ? print_circular_bug+0x120/0x120 Apr 15 14:44:59 manet kernel: [<ffffffff810b6f91>] mark_lock_irq+0xe1/0x2b0 Apr 15 14:44:59 manet kernel: [<ffffffff810b7279>] mark_lock+0x119/0x1c0 Apr 15 14:44:59 manet kernel: [<ffffffff810b73b5>] mark_irqflags+0x95/0x170 Apr 15 14:44:59 manet kernel: [<ffffffff810b8afc>] __lock_acquire+0x29c/0x570 Apr 15 14:44:59 manet kernel: [<ffffffff810b6a4c>] ? check_usage_forwards+0xcc/0x120 Apr 15 14:44:59 manet kernel: [<ffffffff810b8f01>] lock_acquire+0x131/0x160 Apr 15 14:44:59 manet kernel: [<ffffffffa0649ece>] ? mlx4_ib_addr_event+0xde/0x180 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffff8161bb19>] _raw_spin_lock+0x39/0x50 Apr 15 14:44:59 manet kernel: [<ffffffffa0649ece>] ? mlx4_ib_addr_event+0xde/0x180 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffffa0649ece>] mlx4_ib_addr_event+0xde/0x180 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffffa0649f97>] mlx4_ib_inet6_event+0x27/0x30 [mlx4_ib] Apr 15 14:44:59 manet kernel: [<ffffffff81620934>] notifier_call_chain+0xc4/0xf0 Apr 15 14:44:59 manet kernel: [<ffffffff81620a15>] __atomic_notifier_call_chain+0xb5/0x130 Apr 15 14:44:59 manet kernel: [<ffffffff81620960>] ? notifier_call_chain+0xf0/0xf0 Apr 15 14:44:59 manet kernel: [<ffffffff81620aa6>] atomic_notifier_call_chain+0x16/0x20 Apr 15 14:44:59 manet kernel: [<ffffffff815f4ddb>] inet6addr_notifier_call_chain+0x1b/0x20 Apr 15 14:44:59 manet kernel: [<ffffffffa0377f5d>] ipv6_add_addr+0x48d/0x510 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0377b31>] ? ipv6_add_addr+0x61/0x510 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0379ae2>] ? addrconf_prefix_rcv+0x522/0x6f0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0379b1a>] addrconf_prefix_rcv+0x55a/0x6f0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0389180>] ndisc_router_discovery+0x7b0/0x970 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa038aa5a>] ndisc_rcv+0x18a/0x1c0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0392b30>] icmpv6_rcv+0x490/0x580 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0396180>] ? mld_ifc_timer_expire+0x60/0x60 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0370978>] ip6_input_finish+0x4a8/0x710 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa037054c>] ? ip6_input_finish+0x7c/0x710 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa036ffd8>] ip6_input+0x58/0x60 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa037028d>] ip6_mc_input+0x2ad/0x2e0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa03704c6>] ip6_rcv_finish+0x206/0x210 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa03711d8>] ipv6_rcv+0x5f8/0x6a0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffffa0370c28>] ? ipv6_rcv+0x48/0x6a0 [ipv6] Apr 15 14:44:59 manet kernel: [<ffffffff81534591>] __netif_receive_skb_core+0x8d1/0x960 Apr 15 14:44:59 manet kernel: [<ffffffff81533f00>] ? __netif_receive_skb_core+0x240/0x960 Apr 15 14:44:59 manet kernel: [<ffffffff8152c060>] ? rcu_read_unlock+0x40/0x70 Apr 15 14:44:59 manet kernel: [<ffffffff81534687>] __netif_receive_skb+0x67/0x80 Apr 15 14:44:59 manet kernel: [<ffffffff81534a08>] netif_receive_skb_internal+0x1b8/0x1d0 Apr 15 14:44:59 manet kernel: [<ffffffff81537248>] napi_gro_receive+0xc8/0x120 Apr 15 14:44:59 manet kernel: [<ffffffffa043bb97>] rtl_rx+0x357/0x3c0 [r8169] Apr 15 14:44:59 manet kernel: [<ffffffffa043bc71>] rtl8169_poll+0x71/0x200 [r8169] Apr 15 14:44:59 manet kernel: [<ffffffff81536797>] net_rx_action+0xc7/0x2f0 Apr 15 14:44:59 manet kernel: [<ffffffff8105f941>] ? __do_softirq+0xc1/0x420 Apr 15 14:44:59 manet kernel: [<ffffffff8105fa52>] __do_softirq+0x1d2/0x420 Apr 15 14:44:59 manet kernel: [<ffffffff8105fd44>] irq_exit+0x44/0xd0 Apr 15 14:44:59 manet kernel: [<ffffffff816281b5>] do_IRQ+0xd5/0x100 Apr 15 14:44:59 manet kernel: [<ffffffff8161c92f>] common_interrupt+0x6f/0x6f Apr 15 14:44:59 manet kernel: <EOI> [<ffffffff814d828b>] ? cpuidle_enter_state+0x5b/0xd0 Apr 15 14:44:59 manet kernel: [<ffffffff814d8287>] ? cpuidle_enter_state+0x57/0xd0 Apr 15 14:44:59 manet kernel: [<ffffffff814d8522>] cpuidle_idle_call+0x112/0x170 Apr 15 14:44:59 manet kernel: [<ffffffff8100d2de>] arch_cpu_idle+0xe/0x30 Apr 15 14:44:59 manet kernel: [<ffffffff810c9f38>] cpu_idle_loop+0x2a8/0x360 Apr 15 14:44:59 manet kernel: [<ffffffff810ca013>] cpu_startup_entry+0x23/0x30 Apr 15 14:44:59 manet kernel: [<ffffffff8160e989>] rest_init+0x149/0x150 Apr 15 14:44:59 manet kernel: [<ffffffff8160e840>] ? csum_partial_copy_generic+0x170/0x170 Apr 15 14:44:59 manet kernel: [<ffffffff81d822af>] start_kernel+0x3ad/0x3b4 Apr 15 14:44:59 manet kernel: [<ffffffff81d81d20>] ? repair_env_string+0x5b/0x5b Apr 15 14:44:59 manet kernel: [<ffffffff81614c8e>] ? memblock_reserve+0x49/0x4e Apr 15 14:44:59 manet kernel: [<ffffffff81d815a3>] x86_64_start_reservations+0x2a/0x2c Apr 15 14:44:59 manet kernel: [<ffffffff81d816e6>] x86_64_start_kernel+0x141/0x148 -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html