On 05/21/2013 05:19 PM, Jack Wang wrote: > On 05/21/2013 02:51 PM, Sebastian Riemer wrote: >> On 17.05.2013 16:16, Jack Wang wrote: >>> unable to handle kernel paging request >> >> Hi Jack, >> >> this should be related to the list corruption in IPoIB as list_del() >> sets the LIST_POISON1 and LIST_POISON2 pointers. >> Referencing these results in page faults according to the documentation >> in the code. >> >> Cheers, >> Sebastian >> > This bug is easy triggered with below inject_bug with iperf -P 50 && > switch ib mode in sync on both side. > -- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -1315,7 +1315,8 @@ static void ipoib_cm_tx_start(struct work_struct > *work) > netif_tx_lock_bh(dev); > spin_lock_irqsave(&priv->lock, flags); > > - if (ret) { > + if (ret || priv->inject_bug) { > + priv->inject_bug = 0; > neigh = p->neigh; > if (neigh) { > neigh->cm = NULL; > > It turned into another panic after patch list_del to list_del_init, I'm > managing to get the back trace. >
Some trace I got during testing, Dear IPoIB expert, could you give some suggestion? It looks like some object life time issues? May 21 15:12:03 ib2 kernel: [ 415.050021] general protection fault: 0000 [#1] SMP May 21 15:12:03 ib2 kernel: [ 415.050114] CPU 2 May 21 15:12:03 ib2 kernel: [ 415.050142] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod mlx4_core [last unloaded: ib_ipoib] May 21 15:12:03 ib2 kernel: [ 415.051845] May 21 15:12:03 ib2 kernel: [ 415.051886] Pid: 3166, comm: kworker/2:0 Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer System Product Name/M4A89GTD-PRO May 21 15:12:03 ib2 kernel: [ 415.052019] RIP: 0010:[<ffffffffa01c8bf9>] [<ffffffffa01c8bf9>] ib_modify_qp+0x9/0x20 [ib_core] May 21 15:12:03 ib2 kernel: [ 415.052106] RSP: 0018:ffff88020efd3b00 EFLAGS: 00010246 May 21 15:12:03 ib2 kernel: [ 415.052148] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 May 21 15:12:03 ib2 kernel: [ 415.052190] RDX: 0000000000129181 RSI: ffff88020efd3b20 RDI: dead4ead00000000 May 21 15:12:03 ib2 kernel: [ 415.052233] RBP: ffff88020efd3b00 R08: 0000000000000000 R09: 0000000000000001 May 21 15:12:03 ib2 kernel: [ 415.052275] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801fb698c60 May 21 15:12:03 ib2 kernel: [ 415.052317] R13: ffff88020efd3b20 R14: ffff8802101fdc00 R15: ffffffff81e14250 May 21 15:12:03 ib2 kernel: [ 415.052360] FS: 00007f8c38a05700(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 May 21 15:12:03 ib2 kernel: [ 415.052415] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 21 15:12:03 ib2 kernel: [ 415.052457] CR2: 00007f8c38535d70 CR3: 0000000001c0b000 CR4: 00000000000007e0 May 21 15:12:03 ib2 kernel: [ 415.052500] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 21 15:12:03 ib2 kernel: [ 415.052542] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 21 15:12:03 ib2 kernel: [ 415.052585] Process kworker/2:0 (pid: 3166, threadinfo ffff88020efd2000, task ffff88021228bf00) May 21 15:12:03 ib2 kernel: [ 415.052640] Stack: May 21 15:12:03 ib2 kernel: [ 415.052678] ffff88020efd3c40 ffffffffa02bfcb9 0000000000000000 001291811228bf00 May 21 15:12:03 ib2 kernel: [ 415.052834] ffffffff00000002 ffff880200000005 000000008173c557 0008005eefed5918 May 21 15:12:03 ib2 kernel: [ 415.052988] ffffffff81e12e00 0000000000000080 ffff88020efd3b70 0000000000000000 May 21 15:12:03 ib2 kernel: [ 415.053143] Call Trace: May 21 15:12:03 ib2 kernel: [ 415.053188] [<ffffffffa02bfcb9>] ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib] May 21 15:12:03 ib2 kernel: [ 415.053233] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:03 ib2 kernel: [ 415.053277] [<ffffffff8173c557>] ? _raw_spin_unlock_irqrestore+0x77/0x80 May 21 15:12:03 ib2 kernel: [ 415.053322] [<ffffffff8105c913>] ? __queue_work+0x103/0x4a0 May 21 15:12:03 ib2 kernel: [ 415.053364] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:03 ib2 kernel: [ 415.053409] [<ffffffffa02c0373>] ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib] May 21 15:12:03 ib2 kernel: [ 415.053452] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:03 ib2 kernel: [ 415.053497] [<ffffffffa0141cc5>] cm_process_work+0x25/0x120 [ib_cm] May 21 15:12:03 ib2 kernel: [ 415.053540] [<ffffffffa0142508>] cm_rep_handler+0x308/0x590 [ib_cm] May 21 15:12:03 ib2 kernel: [ 415.053585] [<ffffffffa0143c65>] cm_work_handler+0x145/0x1070 [ib_cm] May 21 15:12:03 ib2 kernel: [ 415.053628] [<ffffffff8105daea>] process_one_work+0x19a/0x5c0 May 21 15:12:03 ib2 kernel: [ 415.053670] [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0 May 21 15:12:03 ib2 kernel: [ 415.053713] [<ffffffffa0143b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm] May 21 15:12:03 ib2 kernel: [ 415.053757] [<ffffffff8105f865>] worker_thread+0x175/0x380 May 21 15:12:03 ib2 kernel: [ 415.053799] [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210 May 21 15:12:03 ib2 kernel: [ 415.053841] [<ffffffff81064e0e>] kthread+0xbe/0xd0 May 21 15:12:03 ib2 kernel: [ 415.053884] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 May 21 15:12:03 ib2 kernel: [ 415.053928] [<ffffffff817465b4>] kernel_thread_helper+0x4/0x10 May 21 15:12:03 ib2 kernel: [ 415.053972] [<ffffffff8173c4c0>] ? _raw_spin_unlock_irq+0x30/0x50 May 21 15:12:03 ib2 kernel: [ 415.054015] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 May 21 15:12:03 ib2 kernel: [ 415.054058] [<ffffffff8173c8b0>] ? retint_restore_args+0x13/0x13 May 21 15:12:03 ib2 kernel: [ 415.054100] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 May 21 15:12:03 ib2 kernel: [ 415.054144] [<ffffffff817465b0>] ? gs_change+0x13/0x13 May 21 15:12:03 ib2 kernel: [ 415.054185] Code: ff ff 31 c0 eb d6 0f 1f 40 00 83 ca 01 c9 09 c2 31 c0 f7 d2 85 ca 0f 94 c0 c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 <48> 8b 07 31 c9 48 8b 7f 58 ff 90 30 02 00 00 c9 c3 66 0f 1f 44 May 21 15:12:03 ib2 kernel: [ 415.055875] RIP [<ffffffffa01c8bf9>] ib_modify_qp+0x9/0x20 [ib_core] May 21 15:12:03 ib2 kernel: [ 415.055945] RSP <ffff88020efd3b00> May 21 15:12:03 ib2 kernel: [ 415.056011] ---[ end trace 871425e942ec1142 ]--- (gdb) list *ib_modify_qp+0x9 0xbf9 is in ib_modify_qp (drivers/infiniband/core/verbs.c:807). 802 803 int ib_modify_qp(struct ib_qp *qp, 804 struct ib_qp_attr *qp_attr, 805 int qp_attr_mask) 806 { 807 return qp->device->modify_qp(qp->real_qp, qp_attr, qp_attr_mask, NULL); 808 } 809 EXPORT_SYMBOL(ib_modify_qp); 810 811 int ib_query_qp(struct ib_qp *qp, May 21 15:12:03 ib2 kernel: [ 415.056065] BUG: unable to handle kernel paging request at fffffffffffffff8 May 21 15:12:03 ib2 kernel: [ 415.056164] IP: [<ffffffff81064700>] kthread_data+0x10/0x20 May 21 15:12:03 ib2 kernel: [ 415.056236] PGD 1c0d067 PUD 1c0e067 PMD 0 May 21 15:12:03 ib2 kernel: [ 415.056358] Oops: 0000 [#2] SMP May 21 15:12:03 ib2 kernel: [ 415.056449] CPU 2 May 21 15:12:05 ib2 kernel: [ 415.056477] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod mlx4_core [last unloaded: ib_ipoib] May 21 15:12:05 ib2 kernel: [ 415.058609] May 21 15:12:05 ib2 kernel: [ 415.058648] Pid: 3166, comm: kworker/2:0 Tainted: G D O 3.4.23-pserver-hotfix+ #109 System manufacturer System Product Name/M4A89GTD-PRO May 21 15:12:05 ib2 kernel: [ 415.058783] RIP: 0010:[<ffffffff81064700>] [<ffffffff81064700>] kthread_data+0x10/0x20 May 21 15:12:05 ib2 kernel: [ 415.058866] RSP: 0018:ffff88020efd3858 EFLAGS: 00010092 May 21 15:12:05 ib2 kernel: [ 415.058909] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002 May 21 15:12:05 ib2 kernel: [ 415.058954] RDX: ffffffff81e138c0 RSI: 0000000000000002 RDI: ffff88021228bf00 May 21 15:12:05 ib2 kernel: [ 415.058997] RBP: ffff88020efd3858 R08: ffff88021228bf70 R09: 0000000000000001 May 21 15:12:05 ib2 kernel: [ 415.059041] R10: 0000000000000800 R11: 0000000000000000 R12: 0000000000000002 May 21 15:12:05 ib2 kernel: [ 415.059085] R13: ffff88021228c2c8 R14: ffff88020efd3688 R15: ffffffff81e14250 May 21 15:12:05 ib2 kernel: [ 415.059128] FS: 00007f8c38a05700(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 May 21 15:12:05 ib2 kernel: [ 415.059187] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 21 15:12:05 ib2 kernel: [ 415.059230] CR2: fffffffffffffff8 CR3: 0000000001c0b000 CR4: 00000000000007e0 May 21 15:12:05 ib2 kernel: [ 415.059274] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 21 15:12:05 ib2 kernel: [ 415.059317] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 21 15:12:05 ib2 kernel: [ 415.059362] Process kworker/2:0 (pid: 3166, threadinfo ffff88020efd2000, task ffff88021228bf00) May 21 15:12:05 ib2 kernel: [ 415.059420] Stack: May 21 15:12:05 ib2 kernel: [ 415.059460] ffff88020efd3878 ffffffff8105c735 ffff88020efd3878 ffff88021fc92f40 May 21 15:12:05 ib2 kernel: [ 415.059616] ffff88020efd3908 ffffffff8173a963 ffff880200000000 ffff88020efd2000 May 21 15:12:05 ib2 kernel: [ 415.059771] ffff88020efd3fd8 ffff88020efd2000 ffff88020efd2010 ffff88020efd2000 May 21 15:12:05 ib2 kernel: [ 415.059928] Call Trace: May 21 15:12:05 ib2 kernel: [ 415.059969] [<ffffffff8105c735>] wq_worker_sleeping+0x15/0xa0 May 21 15:12:05 ib2 kernel: [ 415.060013] [<ffffffff8173a963>] __schedule+0x6a3/0x940 May 21 15:12:05 ib2 kernel: [ 415.060056] [<ffffffff8173acc9>] schedule+0x29/0x70 May 21 15:12:05 ib2 kernel: [ 415.060098] [<ffffffff81042105>] do_exit+0x615/0xa40 May 21 15:12:05 ib2 kernel: [ 415.060141] [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300 May 21 15:12:05 ib2 kernel: [ 415.060184] [<ffffffff8173d6db>] oops_end+0xab/0xf0 May 21 15:12:05 ib2 kernel: [ 415.060228] [<ffffffff8100570b>] die+0x5b/0x90 May 21 15:12:05 ib2 kernel: [ 415.060270] [<ffffffff8173d274>] do_general_protection+0x164/0x170 May 21 15:12:05 ib2 kernel: [ 415.060315] [<ffffffff8173c8e0>] ? restore_args+0x30/0x30 May 21 15:12:05 ib2 kernel: [ 415.060358] [<ffffffff8173ca95>] general_protection+0x25/0x30 May 21 15:12:05 ib2 kernel: [ 415.060404] [<ffffffffa01c8bf9>] ? ib_modify_qp+0x9/0x20 [ib_core] May 21 15:12:05 ib2 kernel: [ 415.060449] [<ffffffffa02bfcb9>] ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib] May 21 15:12:05 ib2 kernel: [ 415.060493] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:05 ib2 kernel: [ 415.060536] [<ffffffff8173c557>] ? _raw_spin_unlock_irqrestore+0x77/0x80 May 21 15:12:05 ib2 kernel: [ 415.060579] [<ffffffff8105c913>] ? __queue_work+0x103/0x4a0 May 21 15:12:05 ib2 kernel: [ 415.060625] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:05 ib2 kernel: [ 415.060670] [<ffffffffa02c0373>] ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib] May 21 15:12:05 ib2 kernel: [ 415.060714] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:05 ib2 kernel: [ 415.060757] [<ffffffffa0141cc5>] cm_process_work+0x25/0x120 [ib_cm] May 21 15:12:05 ib2 kernel: [ 415.060801] [<ffffffffa0142508>] cm_rep_handler+0x308/0x590 [ib_cm] May 21 15:12:05 ib2 kernel: [ 415.060844] [<ffffffffa0143c65>] cm_work_handler+0x145/0x1070 [ib_cm] May 21 15:12:05 ib2 kernel: [ 415.060887] [<ffffffff8105daea>] process_one_work+0x19a/0x5c0 May 21 15:12:05 ib2 kernel: [ 415.060930] [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0 May 21 15:12:05 ib2 kernel: [ 415.060973] [<ffffffffa0143b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm] May 21 15:12:05 ib2 kernel: [ 415.061016] [<ffffffff8105f865>] worker_thread+0x175/0x380 May 21 15:12:05 ib2 kernel: [ 415.061059] [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210 May 21 15:12:05 ib2 kernel: [ 415.061102] [<ffffffff81064e0e>] kthread+0xbe/0xd0 May 21 15:12:05 ib2 kernel: [ 415.061144] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 May 21 15:12:05 ib2 kernel: [ 415.061188] [<ffffffff817465b4>] kernel_thread_helper+0x4/0x10 May 21 15:12:05 ib2 kernel: [ 415.061234] [<ffffffff8173c4c0>] ? _raw_spin_unlock_irq+0x30/0x50 May 21 15:12:05 ib2 kernel: [ 415.061277] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 May 21 15:12:05 ib2 kernel: [ 415.061319] [<ffffffff8173c8b0>] ? retint_restore_args+0x13/0x13 May 21 15:12:05 ib2 kernel: [ 415.061363] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 May 21 15:12:05 ib2 kernel: [ 415.061406] [<ffffffff817465b0>] ? gs_change+0x13/0x13 May 21 15:12:05 ib2 kernel: [ 415.061447] Code: 66 66 66 90 65 48 8b 04 25 80 b9 00 00 48 8b 80 70 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 May 21 15:12:05 ib2 kernel: [ 415.063139] RIP [<ffffffff81064700>] kthread_data+0x10/0x20 May 21 15:12:05 ib2 kernel: [ 415.063205] RSP <ffff88020efd3858> May 21 15:12:05 ib2 kernel: [ 415.063245] CR2: fffffffffffffff8 May 21 15:12:05 ib2 kernel: [ 415.063285] ---[ end trace 871425e942ec1143 ]--- May 21 15:12:05 ib2 kernel: [ 415.063326] Fixing recursive fault but reboot is needed! May 21 15:12:05 ib2 kernel: [ 417.441382] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:07 ib2 kernel: [ 419.840353] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:10 ib2 kernel: [ 422.198880] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:12 ib2 kernel: [ 424.597641] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:14 ib2 kernel: [ 426.956288] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:17 ib2 kernel: [ 429.355047] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:19 ib2 kernel: [ 431.753621] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:22 ib2 kernel: [ 434.122390] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:24 ib2 kernel: [ 436.521068] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:26 ib2 kernel: [ 436.660137] ------------[ cut here ]------------ May 21 15:12:26 ib2 kernel: [ 436.660216] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0() May 21 15:12:26 ib2 kernel: [ 436.660272] Hardware name: System Product Name May 21 15:12:26 ib2 kernel: [ 436.660313] Watchdog detected hard LOCKUP on cpu 2 May 21 15:12:26 ib2 kernel: [ 436.660341] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod mlx4_core [last unloaded: ib_ipoib] May 21 15:12:26 ib2 kernel: [ 436.662032] Pid: 3166, comm: kworker/2:0 Tainted: G D O 3.4.23-pserver-hotfix+ #109 May 21 15:12:26 ib2 kernel: [ 436.662088] Call Trace: May 21 15:12:26 ib2 kernel: [ 436.662127] <NMI> [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0 May 21 15:12:26 ib2 kernel: [ 436.662197] [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50 May 21 15:12:26 ib2 kernel: [ 436.662239] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:26 ib2 kernel: [ 436.662283] [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0 May 21 15:12:26 ib2 kernel: [ 436.662327] [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320 May 21 15:12:26 ib2 kernel: [ 436.662370] [<ffffffff811087ec>] ? perf_event_update_userpage+0x16c/0x2c0 May 21 15:12:26 ib2 kernel: [ 436.662415] [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170 May 21 15:12:26 ib2 kernel: [ 436.662458] [<ffffffff81107f74>] perf_event_overflow+0x14/0x20 May 21 15:12:26 ib2 kernel: [ 436.662501] [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220 May 21 15:12:26 ib2 kernel: [ 436.662545] [<ffffffff8173e341>] perf_event_nmi_handler+0x21/0x30 May 21 15:12:26 ib2 kernel: [ 436.662588] [<ffffffff8173d8a6>] nmi_handle+0xb6/0x200 May 21 15:12:26 ib2 kernel: [ 436.662631] [<ffffffff8173d7f0>] ? oops_begin+0xd0/0xd0 May 21 15:12:26 ib2 kernel: [ 436.662673] [<ffffffff8173db1d>] do_nmi+0x12d/0x350 May 21 15:12:26 ib2 kernel: [ 436.662715] [<ffffffff8173ceac>] end_repeat_nmi+0x1a/0x1e May 21 15:12:26 ib2 kernel: [ 436.662758] [<ffffffff81420d14>] ? delay_tsc+0x34/0xb0 May 21 15:12:26 ib2 kernel: [ 436.662800] [<ffffffff81420d14>] ? delay_tsc+0x34/0xb0 May 21 15:12:26 ib2 kernel: [ 436.662842] [<ffffffff81420d14>] ? delay_tsc+0x34/0xb0 May 21 15:12:26 ib2 kernel: [ 436.662883] <<EOE>> [<ffffffff81420c8f>] __delay+0xf/0x20 May 21 15:12:26 ib2 kernel: [ 436.662952] [<ffffffff814285a3>] do_raw_spin_lock+0xd3/0x140 May 21 15:12:26 ib2 kernel: [ 436.662995] [<ffffffff8173bc74>] _raw_spin_lock_irq+0x54/0x60 May 21 15:12:26 ib2 kernel: [ 436.663037] [<ffffffff8173a3e0>] ? __schedule+0x120/0x940 May 21 15:12:26 ib2 kernel: [ 436.663080] [<ffffffff8173a3e0>] __schedule+0x120/0x940 May 21 15:12:26 ib2 kernel: [ 436.663122] [<ffffffff8173acc9>] schedule+0x29/0x70 May 21 15:12:26 ib2 kernel: [ 436.663164] [<ffffffff81042293>] do_exit+0x7a3/0xa40 May 21 15:12:26 ib2 kernel: [ 436.663206] [<ffffffff8103e7fe>] ? kmsg_dump+0x1be/0x300 May 21 15:12:26 ib2 kernel: [ 436.663248] [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300 May 21 15:12:26 ib2 kernel: [ 436.663291] [<ffffffff817387f9>] ? printk+0x41/0x48 May 21 15:12:26 ib2 kernel: [ 436.663333] [<ffffffff8173d6db>] oops_end+0xab/0xf0 May 21 15:12:26 ib2 kernel: [ 436.663376] [<ffffffff8102f6bd>] no_context+0x11d/0x2d0 May 21 15:12:26 ib2 kernel: [ 436.663418] [<ffffffff810afbf0>] ? kallsyms_lookup+0x60/0xe0 May 21 15:12:26 ib2 kernel: [ 436.663462] [<ffffffff8102f9ad>] __bad_area_nosemaphore+0x13d/0x220 May 21 15:12:26 ib2 kernel: [ 436.663505] [<ffffffff8102faa3>] bad_area_nosemaphore+0x13/0x20 May 21 15:12:26 ib2 kernel: [ 436.663548] [<ffffffff81740603>] do_page_fault+0x3a3/0x4e0 May 21 15:12:26 ib2 kernel: [ 436.663590] [<ffffffff8173cd06>] ? error_sti+0x5/0x6 May 21 15:12:26 ib2 kernel: [ 436.663632] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:26 ib2 kernel: [ 436.663676] [<ffffffff8142211d>] ? trace_hardirqs_off_thunk+0x3a/0x3c May 21 15:12:26 ib2 kernel: [ 436.663719] [<ffffffff8173cac5>] page_fault+0x25/0x30 May 21 15:12:26 ib2 kernel: [ 436.663762] [<ffffffff81064700>] ? kthread_data+0x10/0x20 May 21 15:12:26 ib2 kernel: [ 436.663804] [<ffffffff8105c735>] wq_worker_sleeping+0x15/0xa0 May 21 15:12:26 ib2 kernel: [ 436.663848] [<ffffffff8173a963>] __schedule+0x6a3/0x940 May 21 15:12:26 ib2 kernel: [ 436.663890] [<ffffffff8173acc9>] schedule+0x29/0x70 May 21 15:12:26 ib2 kernel: [ 436.663932] [<ffffffff81042105>] do_exit+0x615/0xa40 May 21 15:12:26 ib2 kernel: [ 436.663974] [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300 May 21 15:12:26 ib2 kernel: [ 436.664017] [<ffffffff8173d6db>] oops_end+0xab/0xf0 May 21 15:12:26 ib2 kernel: [ 436.664059] [<ffffffff8100570b>] die+0x5b/0x90 May 21 15:12:26 ib2 kernel: [ 436.664102] [<ffffffff8173d274>] do_general_protection+0x164/0x170 May 21 15:12:26 ib2 kernel: [ 436.664145] [<ffffffff8173c8e0>] ? restore_args+0x30/0x30 May 21 15:12:26 ib2 kernel: [ 436.664188] [<ffffffff8173ca95>] general_protection+0x25/0x30 May 21 15:12:26 ib2 kernel: [ 436.664233] [<ffffffffa01c8bf9>] ? ib_modify_qp+0x9/0x20 [ib_core] May 21 15:12:26 ib2 kernel: [ 436.664277] [<ffffffffa02bfcb9>] ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib] May 21 15:12:26 ib2 kernel: [ 436.664321] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:26 ib2 kernel: [ 436.664363] [<ffffffff8173c557>] ? _raw_spin_unlock_irqrestore+0x77/0x80 May 21 15:12:26 ib2 kernel: [ 436.664407] [<ffffffff8105c913>] ? __queue_work+0x103/0x4a0 May 21 15:12:26 ib2 kernel: [ 436.664450] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:26 ib2 kernel: [ 436.664495] [<ffffffffa02c0373>] ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib] May 21 15:12:26 ib2 kernel: [ 436.664538] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 May 21 15:12:26 ib2 kernel: [ 436.664583] [<ffffffffa0141cc5>] cm_process_work+0x25/0x120 [ib_cm] May 21 15:12:26 ib2 kernel: [ 436.664627] [<ffffffffa0142508>] cm_rep_handler+0x308/0x590 [ib_cm] May 21 15:12:26 ib2 kernel: [ 436.664671] [<ffffffffa0143c65>] cm_work_handler+0x145/0x1070 [ib_cm] May 21 15:12:26 ib2 kernel: [ 436.664714] [<ffffffff8105daea>] process_one_work+0x19a/0x5c0 May 21 15:12:26 ib2 kernel: [ 436.664756] [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0 May 21 15:12:26 ib2 kernel: [ 436.664800] [<ffffffffa0143b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm] May 21 15:12:26 ib2 kernel: [ 436.664843] [<ffffffff8105f865>] worker_thread+0x175/0x380 May 21 15:12:26 ib2 kernel: [ 436.664886] [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210 May 21 15:12:26 ib2 kernel: [ 436.664929] [<ffffffff81064e0e>] kthread+0xbe/0xd0 May 21 15:12:26 ib2 kernel: [ 436.664972] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 May 21 15:12:26 ib2 kernel: [ 436.665015] [<ffffffff817465b4>] kernel_thread_helper+0x4/0x10 May 21 15:12:26 ib2 kernel: [ 436.665059] [<ffffffff8173c4c0>] ? _raw_spin_unlock_irq+0x30/0x50 May 21 15:12:26 ib2 kernel: [ 436.665102] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 May 21 15:12:26 ib2 kernel: [ 436.665145] [<ffffffff8173c8b0>] ? retint_restore_args+0x13/0x13 May 21 15:12:26 ib2 kernel: [ 436.665187] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 May 21 15:12:26 ib2 kernel: [ 436.665231] [<ffffffff817465b0>] ? gs_change+0x13/0x13 May 21 15:12:26 ib2 kernel: [ 436.665273] ---[ end trace 871425e942ec1144 ]--- May 21 15:12:26 ib2 kernel: [ 438.919742] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:29 ib2 kernel: [ 441.318429] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:31 ib2 kernel: [ 443.717220] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:34 ib2 kernel: [ 446.115789] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:36 ib2 kernel: [ 448.514602] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:38 ib2 kernel: [ 450.913390] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:41 ib2 kernel: [ 453.271906] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:43 ib2 kernel: [ 455.670796] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:46 ib2 kernel: [ 458.069297] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:48 ib2 kernel: [ 460.438309] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:50 ib2 kernel: [ 462.836738] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:53 ib2 kernel: [ 465.235553] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:55 ib2 kernel: [ 467.634331] ib0: enabling connected mode will cause multicast packet drops May 21 15:12:58 ib2 kernel: [ 468.407807] ------------[ cut here ]------------ May 21 15:12:58 ib2 kernel: [ 468.407897] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0() May 21 15:12:58 ib2 kernel: [ 468.407957] Hardware name: System Product Name May 21 15:12:58 ib2 kernel: [ 468.408001] Watchdog detected hard LOCKUP on cpu 1 May 21 15:12:58 ib2 kernel: [ 468.408032] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod mlx4_core [last unloaded: ib_ipoib] May 21 15:12:58 ib2 kernel: [ 468.409806] Pid: 0, comm: swapper/1 Tainted: G D W O 3.4.23-pserver-hotfix+ #109 May 21 15:12:58 ib2 kernel: [ 468.409866] Call Trace: May 21 15:12:58 ib2 kernel: [ 468.409908] <NMI> [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0 May 21 15:12:58 ib2 kernel: [ 468.409986] [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50 May 21 15:12:58 ib2 kernel: [ 468.410033] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 May 21 15:12:58 ib2 kernel: [ 468.410081] [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0 May 21 15:12:58 ib2 kernel: [ 468.410129] [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320 May 21 15:12:58 ib2 kernel: [ 468.410177] [<ffffffff811087ec>] ? perf_event_update_userpage+0x16c/0x2c0 May 21 15:12:58 ib2 kernel: [ 468.410225] [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170 May 21 15:12:58 ib2 kernel: [ 468.410272] [<ffffffff81107f74>] perf_event_overflow+0x14/0x20 May 21 15:12:58 ib2 kernel: [ 468.410319] [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220 May 21 15:12:58 ib2 kernel: [ 468.410368] [<ffffffff8173e341>] perf_event_nmi_handler+0x21/0x30 May 21 15:12:58 ib2 kernel: [ 468.410416] [<ffffffff8173d8a6>] nmi_handle+0xb6/0x200 May 21 15:12:58 ib2 kernel: [ 468.410462] [<ffffffff8173d7f0>] ? oops_begin+0xd0/0xd0 May 21 15:12:58 ib2 kernel: [ 468.410508] [<ffffffff8173db1d>] do_nmi+0x12d/0x350 May 21 15:12:58 ib2 kernel: [ 468.410554] [<ffffffff8173ceac>] end_repeat_nmi+0x1a/0x1e May 21 15:12:58 ib2 kernel: [ 468.410602] [<ffffffff81420d41>] ? delay_tsc+0x61/0xb0 May 21 15:12:58 ib2 kernel: [ 468.410648] [<ffffffff81420d41>] ? delay_tsc+0x61/0xb0 May 21 15:12:58 ib2 kernel: [ 468.410694] [<ffffffff81420d41>] ? delay_tsc+0x61/0xb0 May 21 15:12:58 ib2 kernel: [ 468.410738] <<EOE>> <IRQ> [<ffffffff81420c8f>] __delay+0xf/0x20 May 21 15:12:58 ib2 kernel: [ 468.410839] [<ffffffff814285a3>] do_raw_spin_lock+0xd3/0x140 May 21 15:12:58 ib2 kernel: [ 468.410885] [<ffffffff8173bba8>] _raw_spin_lock+0x48/0x50 May 21 15:12:58 ib2 kernel: [ 468.410932] [<ffffffff810834f2>] ? sched_rt_period_timer+0xf2/0x270 May 21 15:12:58 ib2 kernel: [ 468.410980] [<ffffffff8173c58b>] ? _raw_spin_unlock+0x2b/0x50 May 21 15:12:58 ib2 kernel: [ 468.411027] [<ffffffff810834f2>] sched_rt_period_timer+0xf2/0x270 May 21 15:12:58 ib2 kernel: [ 468.411075] [<ffffffff81069ff6>] __run_hrtimer+0x86/0x2f0 May 21 15:12:58 ib2 kernel: [ 468.411121] [<ffffffff81083400>] ? init_rt_bandwidth+0x60/0x60 May 21 15:12:58 ib2 kernel: [ 468.411168] [<ffffffff8106a50e>] hrtimer_interrupt+0xfe/0x270 May 21 15:12:58 ib2 kernel: [ 468.411215] [<ffffffff81746ea9>] smp_apic_timer_interrupt+0x69/0x99 May 21 15:12:58 ib2 kernel: [ 468.411263] [<ffffffff81745caf>] apic_timer_interrupt+0x6f/0x80 May 21 15:12:58 ib2 kernel: [ 468.411308] <EOI> [<ffffffff8100bab1>] ? default_idle+0x61/0x320 May 21 15:12:58 ib2 kernel: [ 468.411383] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 May 21 15:12:58 ib2 kernel: [ 468.411431] [<ffffffff8102b3d6>] ? native_safe_halt+0x6/0x10 May 21 15:12:58 ib2 kernel: [ 468.411477] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 May 21 15:12:58 ib2 kernel: [ 468.411523] [<ffffffff8100bab6>] default_idle+0x66/0x320 May 21 15:12:58 ib2 kernel: [ 468.411569] [<ffffffff8100be02>] amd_e400_idle+0x92/0x130 May 21 15:12:58 ib2 kernel: [ 468.411617] [<ffffffff8100af36>] cpu_idle+0xf6/0x140 May 21 15:12:58 ib2 kernel: [ 468.411664] [<ffffffff81731d77>] start_secondary+0x1ed/0x1f4 May 21 15:12:58 ib2 kernel: [ 468.411709] ---[ end trace 871425e942ec1145 ]--- May 21 15:12:58 ib2 kernel: [ 470.032848] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:00 ib2 kernel: [ 472.431601] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:02 ib2 kernel: [ 474.830297] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:05 ib2 kernel: [ 477.229094] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:07 ib2 kernel: [ 479.627563] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:10 ib2 kernel: [ 482.026253] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:12 ib2 kernel: [ 484.395049] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:14 ib2 kernel: [ 486.793758] ib0: enabling connected mode will cause multicast packet drops May 21 15:13:17 ib2 kernel: [ 489.192468] ib0: enabling connected mode will cause multicast packet drops [ 884.055635] general protection fault: 0000 [#1] SMP [ 884.055780] CPU 0 [ 884.055821] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib] [ 884.058726] [ 884.058788] Pid: 3001, comm: kworker/0:0 Tainted: G O 3.4.23-pserver-hotfix+ #111 System manufacturer System Product Name/M4A89GTD-PRO [ 884.059827] RIP: 0010:[<ffffffffa02dc3e0>] [<ffffffffa02dc3e0>] ipoib_cm_tx_handler+0x30/0x2b0 [ib_ipoib] [ 884.059952] RSP: 0018:ffff8801fad67c50 EFLAGS: 00010293 [ 884.060015] RAX: ffff8801fad67fd8 RBX: ffff880211ed5d88 RCX: 0000000000000006 [ 884.060080] RDX: 0000000000000003 RSI: ffff8801f664c0d8 RDI: ffff880211ed5d88 [ 884.060139] RBP: ffff8801fad67ca0 R08: 0000000000000001 R09: 0000000000000002 [ 884.060198] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801f664c000 [ 884.060257] R13: ffff88020d110b98 R14: 6b6b6b6b6b6b756b R15: ffff8801f664c0d8 [ 884.060316] FS: 00007f11da415700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000 [ 884.060390] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 884.060449] CR2: 00007f11d032c000 CR3: 00000001f16f5000 CR4: 00000000000007f0 [ 884.060512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 884.060579] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 884.060643] Process kworker/0:0 (pid: 3001, threadinfo ffff8801fad66000, task ffff8801fb734180) [ 884.060717] Stack: [ 884.060777] ffff8801fad67ca0 ffffffff8109f019 ffff8801fad67c70 ffffffff8109c0bd [ 884.061014] ffff8801fad67c90 ffff880211ed5d88 ffff8801f664c000 ffff8801f664c000 [ 884.061248] ffff88020c031100 ffff8801fad67dc0 ffff8801fad67cf0 ffffffffa017fcc5 [ 884.061486] Call Trace: [ 884.061544] [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120 [ 884.061610] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 [ 884.061673] [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm] [ 884.061734] [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm] [ 884.061798] [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm] [ 884.061867] [<ffffffff8105daea>] process_one_work+0x19a/0x5c0 [ 884.061929] [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0 [ 884.061990] [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm] [ 884.062055] [<ffffffff8105f865>] worker_thread+0x175/0x380 [ 884.062116] [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210 [ 884.062176] [<ffffffff81064e0e>] kthread+0xbe/0xd0 [ 884.062239] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 [ 884.062302] [<ffffffff81746734>] kernel_thread_helper+0x4/0x10 [ 884.062792] [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13 [ 884.062853] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 [ 884.062914] [<ffffffff81746730>] ? gs_change+0x13/0x13 [ 884.062974] Code: 57 41 56 41 55 41 54 53 48 83 ec 28 66 66 66 66 90 4c 8b 6f 08 8b 16 48 89 fb 49 89 f7 4d 8b 75 20 49 81 c6 00 0a 00 00 83 fa 0b <4d> 8b 66 38 77 2a 89 d0 ff 24 c5 90 08 2e a0 90 44 8b 1d a1 79 [ 884.066632] RIP [<ffffffffa02dc3e0>] ipoib_cm_tx_handler+0x30/0x2b0 [ib_ipoib] [ 884.066770] RSP <ffff8801fad67c50> [ 884.066841] ---[ end trace fa3d54b0aa9bc9ce ]--- (gdb) list *ipoib_cm_tx_handler+0x30 0xa410 is in ipoib_cm_tx_handler (drivers/infiniband/ulp/ipoib/ipoib_cm.c:1208). 1203 static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, 1204 struct ib_cm_event *event) 1205 { 1206 struct ipoib_cm_tx *tx = cm_id->context; 1207 struct ipoib_dev_priv *priv = netdev_priv(tx->dev); 1208 struct net_device *dev = priv->dev; 1209 struct ipoib_neigh *neigh; 1210 unsigned long flags; 1211 int ret; 1212 [ 884.066926] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 884.067090] IP: [<ffffffff81064700>] kthread_data+0x10/0x20 [ 884.067210] PGD 1c0d067 PUD 1c0e067 PMD 0 [ 884.067412] Oops: 0000 [#2] SMP [ 884.067565] CPU 0 [ 884.067618] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib] [ 884.071695] [ 884.071753] Pid: 3001, comm: kworker/0:0 Tainted: G D O 3.4.23-pserver-hotfix+ #111 System manufacturer System Product Name/M4A89GTD-PRO [ 884.071972] RIP: 0010:[<ffffffff81064700>] [<ffffffff81064700>] kthread_data+0x10/0x20 [ 884.072099] RSP: 0018:ffff8801fad679a8 EFLAGS: 00010096 [ 884.072168] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 884.072228] RDX: ffffffff81e138c0 RSI: 0000000000000000 RDI: ffff8801fb734180 [ 884.072293] RBP: ffff8801fad679a8 R08: ffff8801fb7341f0 R09: 000000cdd60f50a3 [ 884.072357] R10: 0000000000000c00 R11: 0000000000000000 R12: 0000000000000000 [ 884.072422] R13: ffff8801fb734548 R14: ffff8801fad677d8 R15: ffff8801f664c0d8 [ 884.072485] FS: 00007f11da415700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000 [ 884.072560] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 884.072623] CR2: fffffffffffffff8 CR3: 00000001f16f5000 CR4: 00000000000007f0 [ 884.072690] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 884.072762] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 884.072827] Process kworker/0:0 (pid: 3001, threadinfo ffff8801fad66000, task ffff8801fb734180) [ 884.072909] Stack: [ 884.072969] ffff8801fad679c8 ffffffff8105c735 ffff8801fad679c8 ffff88021fc12f40 [ 884.074211] ffff8801fad67a58 ffffffff8173aad3 ffff880100000000 ffff8801fad66000 [ 884.074481] ffff8801fad67fd8 ffff8801fad66000 ffff8801fad66010 ffff8801fad66000 [ 884.074742] Call Trace: [ 884.074801] [<ffffffff8105c735>] wq_worker_sleeping+0x15/0xa0 [ 884.074869] [<ffffffff8173aad3>] __schedule+0x6a3/0x940 [ 884.074934] [<ffffffff8173ae39>] schedule+0x29/0x70 [ 884.074998] [<ffffffff81042105>] do_exit+0x615/0xa40 [ 884.075061] [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300 [ 884.075123] [<ffffffff8173d85b>] oops_end+0xab/0xf0 [ 884.075184] [<ffffffff8100570b>] die+0x5b/0x90 [ 884.075245] [<ffffffff8173d3f4>] do_general_protection+0x164/0x170 [ 884.075308] [<ffffffff8173ca60>] ? restore_args+0x30/0x30 [ 884.075370] [<ffffffff8173cc15>] general_protection+0x25/0x30 [ 884.075434] [<ffffffffa02dc3e0>] ? ipoib_cm_tx_handler+0x30/0x2b0 [ib_ipoib] [ 884.075498] [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120 [ 884.075559] [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10 [ 884.075622] [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm] [ 884.075686] [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm] [ 884.075750] [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm] [ 884.075813] [<ffffffff8105daea>] process_one_work+0x19a/0x5c0 [ 884.075875] [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0 [ 884.075938] [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm] [ 884.076001] [<ffffffff8105f865>] worker_thread+0x175/0x380 [ 884.076064] [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210 [ 884.076126] [<ffffffff81064e0e>] kthread+0xbe/0xd0 [ 884.076187] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 [ 884.076252] [<ffffffff81746734>] kernel_thread_helper+0x4/0x10 [ 884.076313] [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13 [ 884.076376] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 [ 884.076438] [<ffffffff81746730>] ? gs_change+0x13/0x13 [ 884.076499] Code: 66 66 66 90 65 48 8b 04 25 80 b9 00 00 48 8b 80 70 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [ 884.081230] RIP [<ffffffff81064700>] kthread_data+0x10/0x20 [ 884.081332] RSP <ffff8801fad679a8> [ 884.081388] CR2: fffffffffffffff8 [ 884.081447] ---[ end trace fa3d54b0aa9bc9cf ]--- [ 884.081504] Fixing recursive fault but reboot is needed! [ 903.845688] ------------[ cut here ]------------ [ 903.845800] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0() [ 903.845878] Hardware name: System Product Name [ 903.845939] Watchdog detected hard LOCKUP on cpu 3 [ 903.845989] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib] [ 903.850712] Pid: 19, comm: ksoftirqd/3 Tainted: G D O 3.4.23-pserver-hotfix+ #111 [ 903.850790] Call Trace: [ 903.850851] <NMI> [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0 [ 903.850967] [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50 [ 903.851034] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 [ 903.851101] [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0 [ 903.851167] [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320 [ 903.851233] [<ffffffff811087ec>] ? perf_event_update_userpage+0x16c/0x2c0 [ 903.851299] [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170 [ 903.852535] [<ffffffff81107f74>] perf_event_overflow+0x14/0x20 [ 903.852601] [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220 [ 903.852668] [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30 [ 903.852733] [<ffffffff8173da26>] nmi_handle+0xb6/0x200 [ 903.852798] [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0 [ 903.852863] [<ffffffff8173dc9d>] do_nmi+0x12d/0x350 [ 903.852928] [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e [ 903.852994] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 903.853059] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 903.853123] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 903.853188] <<EOE>> [<ffffffff81420dff>] __delay+0xf/0x20 [ 903.853302] [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140 [ 903.853367] [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50 [ 903.853433] [<ffffffff8107771f>] ? try_to_wake_up+0x20f/0x2f0 [ 903.853498] [<ffffffff8107771f>] try_to_wake_up+0x20f/0x2f0 [ 903.853564] [<ffffffff81077812>] default_wake_function+0x12/0x20 [ 903.853629] [<ffffffff810654cd>] autoremove_wake_function+0x1d/0x50 [ 903.853694] [<ffffffff8106e729>] __wake_up_common+0x59/0x90 [ 903.853759] [<ffffffff81071310>] __wake_up+0x40/0x60 [ 903.853827] [<ffffffff815cc82c>] sk_stream_write_space+0xdc/0x230 [ 903.853892] [<ffffffff815cc794>] ? sk_stream_write_space+0x44/0x230 [ 903.853958] [<ffffffff81629760>] tcp_data_snd_check+0x110/0x120 [ 903.854023] [<ffffffff8162e829>] tcp_rcv_established+0x389/0x870 [ 903.854089] [<ffffffff81639a17>] tcp_v4_do_rcv+0x297/0x5d0 [ 903.854153] [<ffffffff8163a2f1>] tcp_v4_rcv+0x5a1/0x930 [ 903.854217] [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0 [ 903.854283] [<ffffffff81611ee5>] ip_local_deliver_finish+0x135/0x4f0 [ 903.854348] [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0 [ 903.854413] [<ffffffff81611da0>] ip_local_deliver+0x80/0x90 [ 903.854478] [<ffffffff8161244d>] ip_rcv_finish+0x1ad/0x660 [ 903.854544] [<ffffffff81611c58>] ip_rcv+0x228/0x2f0 [ 903.854610] [<ffffffff815d7696>] __netif_receive_skb+0x2c6/0x990 [ 903.854675] [<ffffffff815d74e6>] ? __netif_receive_skb+0x116/0x990 [ 903.854741] [<ffffffff81162487>] ? __kmalloc_node_track_caller+0xf7/0x250 [ 903.854807] [<ffffffff815d89bd>] netif_receive_skb+0x2d/0x210 [ 903.854877] [<ffffffffa02de26a>] ipoib_cm_handle_rx_wc+0x1fa/0x710 [ib_ipoib] [ 903.854958] [<ffffffff8173c6fb>] ? _raw_spin_unlock+0x2b/0x50 [ 903.855026] [<ffffffffa02ded32>] ? ipoib_cm_handle_tx_wc+0x1c2/0x370 [ib_ipoib] [ 903.855108] [<ffffffffa02d7a86>] ipoib_poll+0xd6/0x190 [ib_ipoib] [ 903.855173] [<ffffffff815d97ad>] net_rx_action+0x13d/0x320 [ 903.855239] [<ffffffff81045048>] __do_softirq+0xf8/0x380 [ 903.855304] [<ffffffff810453ed>] run_ksoftirqd+0x11d/0x1e0 [ 903.855368] [<ffffffff810452d0>] ? __do_softirq+0x380/0x380 [ 903.855433] [<ffffffff81064e0e>] kthread+0xbe/0xd0 [ 903.855497] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 [ 903.855564] [<ffffffff81746734>] kernel_thread_helper+0x4/0x10 [ 903.856798] [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13 [ 903.856864] [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70 [ 903.856929] [<ffffffff81746730>] ? gs_change+0x13/0x13 [ 903.856993] ---[ end trace fa3d54b0aa9bc9d0 ]--- [ 917.505825] ------------[ cut here ]------------ [ 917.505938] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0() [ 917.506014] Hardware name: System Product Name [ 917.506075] Watchdog detected hard LOCKUP on cpu 2 [ 917.506123] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib] [ 917.510288] Pid: 3337, comm: iperf Tainted: G D W O 3.4.23-pserver-hotfix+ #111 [ 917.510362] Call Trace: [ 917.510421] <NMI> [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0 [ 917.510534] [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50 [ 917.510598] [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0 [ 917.510662] [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0 [ 917.511154] [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320 [ 917.511218] [<ffffffff811087ec>] ? perf_event_update_userpage+0x16c/0x2c0 [ 917.511283] [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170 [ 917.511347] [<ffffffff81107f74>] perf_event_overflow+0x14/0x20 [ 917.511411] [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220 [ 917.511477] [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30 [ 917.511541] [<ffffffff8173da26>] nmi_handle+0xb6/0x200 [ 917.511604] [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0 [ 917.511669] [<ffffffff8173dc9d>] do_nmi+0x12d/0x350 [ 917.511732] [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e [ 917.511796] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 917.511859] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 917.511921] [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0 [ 917.511984] <<EOE>> [<ffffffff81420dff>] __delay+0xf/0x20 [ 917.512093] [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140 [ 917.512158] [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50 [ 917.513308] [<ffffffff8107eee0>] ? load_balance+0x540/0x8a0 [ 917.513371] [<ffffffff8107eee0>] load_balance+0x540/0x8a0 [ 917.513435] [<ffffffff8107eefc>] ? load_balance+0x55c/0x8a0 [ 917.513498] [<ffffffff8107fe8d>] idle_balance+0x13d/0x2b0 [ 917.513560] [<ffffffff8107fda0>] ? idle_balance+0x50/0x2b0 [ 917.513623] [<ffffffff8173acc0>] __schedule+0x890/0x940 [ 917.513686] [<ffffffff8173ae39>] schedule+0x29/0x70 [ 917.513749] [<ffffffff81738bd5>] schedule_timeout+0x225/0x3b0 [ 917.513812] [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0 [ 917.513877] [<ffffffff815c26ae>] ? release_sock+0x14e/0x1b0 [ 917.513939] [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10 [ 917.514003] [<ffffffff81045542>] ? local_bh_enable_ip+0x92/0xf0 [ 917.514067] [<ffffffff8173c5f3>] ? _raw_spin_unlock_bh+0x43/0x50 [ 917.514132] [<ffffffff815ccf98>] sk_stream_wait_memory+0x218/0x300 [ 917.514196] [<ffffffff810654b0>] ? wake_up_bit+0x40/0x40 [ 917.514260] [<ffffffff816247d1>] tcp_sendmsg+0x681/0xc30 [ 917.514324] [<ffffffff8164e0db>] inet_sendmsg+0x12b/0x240 [ 917.514387] [<ffffffff8164dfb0>] ? inet_create+0x5b0/0x5b0 [ 917.514450] [<ffffffff815c27c2>] ? sock_update_classid+0xb2/0x2b0 [ 917.514514] [<ffffffff815c2860>] ? sock_update_classid+0x150/0x2b0 [ 917.514577] [<ffffffff815bdf90>] sock_aio_write+0x190/0x1b0 [ 917.514641] [<ffffffff8113924f>] ? handle_pte_fault+0x50f/0x8e0 [ 917.514706] [<ffffffff8116e11a>] do_sync_write+0xea/0x130 [ 917.514770] [<ffffffff81170cc3>] ? fget_light+0x43/0x490 [ 917.514835] [<ffffffff813b1013>] ? security_file_permission+0x23/0x90 [ 917.514900] [<ffffffff8116e772>] vfs_write+0x172/0x190 [ 917.514965] [<ffffffff8116e881>] sys_write+0x51/0x90 [ 917.515028] [<ffffffff817452e9>] system_call_fastpath+0x16/0x1b [ 917.515092] ---[ end trace fa3d54b0aa9bc9d1 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html