On 05/21/2013 05:19 PM, Jack Wang wrote:
> On 05/21/2013 02:51 PM, Sebastian Riemer wrote:
>> On 17.05.2013 16:16, Jack Wang wrote:
>>> unable to handle kernel paging request
>>
>> Hi Jack,
>>
>> this should be related to the list corruption in IPoIB as list_del()
>> sets the LIST_POISON1 and LIST_POISON2 pointers.
>> Referencing these results in page faults according to the documentation
>> in the code.
>>
>> Cheers,
>> Sebastian
>>
> This bug is easy triggered with below inject_bug with iperf -P 50 &&
> switch ib mode in sync on both side.
> -- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1315,7 +1315,8 @@ static void ipoib_cm_tx_start(struct work_struct
> *work)
>               netif_tx_lock_bh(dev);
>               spin_lock_irqsave(&priv->lock, flags);
> 
> -             if (ret) {
> +             if (ret || priv->inject_bug) {
> +                     priv->inject_bug = 0;
>                       neigh = p->neigh;
>                       if (neigh) {
>                               neigh->cm = NULL;
> 
> It turned into another panic after patch list_del to list_del_init, I'm
> managing to get the back trace.
> 

Some trace I got during testing, Dear IPoIB expert, could you give some
suggestion? It looks like some object life time issues?



May 21 15:12:03 ib2 kernel: [  415.050021] general protection fault:
0000 [#1] SMP
May 21 15:12:03 ib2 kernel: [  415.050114] CPU 2
May 21 15:12:03 ib2 kernel: [  415.050142] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.051845]
May 21 15:12:03 ib2 kernel: [  415.051886] Pid: 3166, comm: kworker/2:0
Tainted: G           O 3.4.23-pserver-hotfix+ #109 System manufacturer
System Product Name/M4A89GTD-PRO
May 21 15:12:03 ib2 kernel: [  415.052019] RIP:
0010:[<ffffffffa01c8bf9>]  [<ffffffffa01c8bf9>] ib_modify_qp+0x9/0x20
[ib_core]
May 21 15:12:03 ib2 kernel: [  415.052106] RSP: 0018:ffff88020efd3b00
EFLAGS: 00010246
May 21 15:12:03 ib2 kernel: [  415.052148] RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052190] RDX: 0000000000129181 RSI:
ffff88020efd3b20 RDI: dead4ead00000000
May 21 15:12:03 ib2 kernel: [  415.052233] RBP: ffff88020efd3b00 R08:
0000000000000000 R09: 0000000000000001
May 21 15:12:03 ib2 kernel: [  415.052275] R10: 0000000000000000 R11:
0000000000000000 R12: ffff8801fb698c60
May 21 15:12:03 ib2 kernel: [  415.052317] R13: ffff88020efd3b20 R14:
ffff8802101fdc00 R15: ffffffff81e14250
May 21 15:12:03 ib2 kernel: [  415.052360] FS:  00007f8c38a05700(0000)
GS:ffff88021fc80000(0000) knlGS:0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052415] CS:  0010 DS: 0000 ES: 0000
CR0: 000000008005003b
May 21 15:12:03 ib2 kernel: [  415.052457] CR2: 00007f8c38535d70 CR3:
0000000001c0b000 CR4: 00000000000007e0
May 21 15:12:03 ib2 kernel: [  415.052500] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.052542] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 21 15:12:03 ib2 kernel: [  415.052585] Process kworker/2:0 (pid:
3166, threadinfo ffff88020efd2000, task ffff88021228bf00)
May 21 15:12:03 ib2 kernel: [  415.052640] Stack:
May 21 15:12:03 ib2 kernel: [  415.052678]  ffff88020efd3c40
ffffffffa02bfcb9 0000000000000000 001291811228bf00
May 21 15:12:03 ib2 kernel: [  415.052834]  ffffffff00000002
ffff880200000005 000000008173c557 0008005eefed5918
May 21 15:12:03 ib2 kernel: [  415.052988]  ffffffff81e12e00
0000000000000080 ffff88020efd3b70 0000000000000000
May 21 15:12:03 ib2 kernel: [  415.053143] Call Trace:
May 21 15:12:03 ib2 kernel: [  415.053188]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.053233]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.053277]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:03 ib2 kernel: [  415.053322]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:03 ib2 kernel: [  415.053364]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:03 ib2 kernel: [  415.053409]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:03 ib2 kernel: [  415.053452]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.053497]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053540]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053585]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053628]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:03 ib2 kernel: [  415.053670]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:03 ib2 kernel: [  415.053713]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:03 ib2 kernel: [  415.053757]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:03 ib2 kernel: [  415.053799]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:03 ib2 kernel: [  415.053841]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:03 ib2 kernel: [  415.053884]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:03 ib2 kernel: [  415.053928]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:03 ib2 kernel: [  415.053972]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:03 ib2 kernel: [  415.054015]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:03 ib2 kernel: [  415.054058]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:03 ib2 kernel: [  415.054100]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:03 ib2 kernel: [  415.054144]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:03 ib2 kernel: [  415.054185] Code: ff ff 31 c0 eb d6 0f 1f
40 00 83 ca 01 c9 09 c2 31 c0 f7 d2 85 ca 0f 94 c0 c3 0f 1f 84 00 00 00
00 00 55 48 89 e5 66 66 66 66 90 <48> 8b 07 31 c9 48 8b 7f 58 ff 90 30
02 00 00 c9 c3 66 0f 1f 44
May 21 15:12:03 ib2 kernel: [  415.055875] RIP  [<ffffffffa01c8bf9>]
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:03 ib2 kernel: [  415.055945]  RSP <ffff88020efd3b00>
May 21 15:12:03 ib2 kernel: [  415.056011] ---[ end trace
871425e942ec1142 ]---
(gdb) list *ib_modify_qp+0x9
0xbf9 is in ib_modify_qp (drivers/infiniband/core/verbs.c:807).
802     
803     int ib_modify_qp(struct ib_qp *qp,
804                      struct ib_qp_attr *qp_attr,
805                      int qp_attr_mask)
806     {
807             return qp->device->modify_qp(qp->real_qp, qp_attr, 
qp_attr_mask, NULL);
808     }
809     EXPORT_SYMBOL(ib_modify_qp);
810     
811     int ib_query_qp(struct ib_qp *qp,



May 21 15:12:03 ib2 kernel: [  415.056065] BUG: unable to handle kernel
paging request at fffffffffffffff8
May 21 15:12:03 ib2 kernel: [  415.056164] IP: [<ffffffff81064700>]
kthread_data+0x10/0x20
May 21 15:12:03 ib2 kernel: [  415.056236] PGD 1c0d067 PUD 1c0e067 PMD 0
May 21 15:12:03 ib2 kernel: [  415.056358] Oops: 0000 [#2] SMP
May 21 15:12:03 ib2 kernel: [  415.056449] CPU 2
May 21 15:12:05 ib2 kernel: [  415.056477] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.058609]
May 21 15:12:05 ib2 kernel: [  415.058648] Pid: 3166, comm: kworker/2:0
Tainted: G      D    O 3.4.23-pserver-hotfix+ #109 System manufacturer
System Product Name/M4A89GTD-PRO
May 21 15:12:05 ib2 kernel: [  415.058783] RIP:
0010:[<ffffffff81064700>]  [<ffffffff81064700>] kthread_data+0x10/0x20
May 21 15:12:05 ib2 kernel: [  415.058866] RSP: 0018:ffff88020efd3858
EFLAGS: 00010092
May 21 15:12:05 ib2 kernel: [  415.058909] RAX: 0000000000000000 RBX:
0000000000000002 RCX: 0000000000000002
May 21 15:12:05 ib2 kernel: [  415.058954] RDX: ffffffff81e138c0 RSI:
0000000000000002 RDI: ffff88021228bf00
May 21 15:12:05 ib2 kernel: [  415.058997] RBP: ffff88020efd3858 R08:
ffff88021228bf70 R09: 0000000000000001
May 21 15:12:05 ib2 kernel: [  415.059041] R10: 0000000000000800 R11:
0000000000000000 R12: 0000000000000002
May 21 15:12:05 ib2 kernel: [  415.059085] R13: ffff88021228c2c8 R14:
ffff88020efd3688 R15: ffffffff81e14250
May 21 15:12:05 ib2 kernel: [  415.059128] FS:  00007f8c38a05700(0000)
GS:ffff88021fc80000(0000) knlGS:0000000000000000
May 21 15:12:05 ib2 kernel: [  415.059187] CS:  0010 DS: 0000 ES: 0000
CR0: 000000008005003b
May 21 15:12:05 ib2 kernel: [  415.059230] CR2: fffffffffffffff8 CR3:
0000000001c0b000 CR4: 00000000000007e0
May 21 15:12:05 ib2 kernel: [  415.059274] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
May 21 15:12:05 ib2 kernel: [  415.059317] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
May 21 15:12:05 ib2 kernel: [  415.059362] Process kworker/2:0 (pid:
3166, threadinfo ffff88020efd2000, task ffff88021228bf00)
May 21 15:12:05 ib2 kernel: [  415.059420] Stack:
May 21 15:12:05 ib2 kernel: [  415.059460]  ffff88020efd3878
ffffffff8105c735 ffff88020efd3878 ffff88021fc92f40
May 21 15:12:05 ib2 kernel: [  415.059616]  ffff88020efd3908
ffffffff8173a963 ffff880200000000 ffff88020efd2000
May 21 15:12:05 ib2 kernel: [  415.059771]  ffff88020efd3fd8
ffff88020efd2000 ffff88020efd2010 ffff88020efd2000
May 21 15:12:05 ib2 kernel: [  415.059928] Call Trace:
May 21 15:12:05 ib2 kernel: [  415.059969]  [<ffffffff8105c735>]
wq_worker_sleeping+0x15/0xa0
May 21 15:12:05 ib2 kernel: [  415.060013]  [<ffffffff8173a963>]
__schedule+0x6a3/0x940
May 21 15:12:05 ib2 kernel: [  415.060056]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:05 ib2 kernel: [  415.060098]  [<ffffffff81042105>]
do_exit+0x615/0xa40
May 21 15:12:05 ib2 kernel: [  415.060141]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:05 ib2 kernel: [  415.060184]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:05 ib2 kernel: [  415.060228]  [<ffffffff8100570b>]
die+0x5b/0x90
May 21 15:12:05 ib2 kernel: [  415.060270]  [<ffffffff8173d274>]
do_general_protection+0x164/0x170
May 21 15:12:05 ib2 kernel: [  415.060315]  [<ffffffff8173c8e0>] ?
restore_args+0x30/0x30
May 21 15:12:05 ib2 kernel: [  415.060358]  [<ffffffff8173ca95>]
general_protection+0x25/0x30
May 21 15:12:05 ib2 kernel: [  415.060404]  [<ffffffffa01c8bf9>] ?
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:05 ib2 kernel: [  415.060449]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.060493]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.060536]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:05 ib2 kernel: [  415.060579]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:05 ib2 kernel: [  415.060625]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:05 ib2 kernel: [  415.060670]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:05 ib2 kernel: [  415.060714]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.060757]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060801]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060844]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.060887]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:05 ib2 kernel: [  415.060930]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:05 ib2 kernel: [  415.060973]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:05 ib2 kernel: [  415.061016]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:05 ib2 kernel: [  415.061059]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:05 ib2 kernel: [  415.061102]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:05 ib2 kernel: [  415.061144]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:05 ib2 kernel: [  415.061188]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:05 ib2 kernel: [  415.061234]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:05 ib2 kernel: [  415.061277]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:05 ib2 kernel: [  415.061319]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:05 ib2 kernel: [  415.061363]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:05 ib2 kernel: [  415.061406]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:05 ib2 kernel: [  415.061447] Code: 66 66 66 90 65 48 8b 04
25 80 b9 00 00 48 8b 80 70 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66
66 66 66 90 48 8b 87 70 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00
00 00 00 00 55 48 89 e5 66
May 21 15:12:05 ib2 kernel: [  415.063139] RIP  [<ffffffff81064700>]
kthread_data+0x10/0x20
May 21 15:12:05 ib2 kernel: [  415.063205]  RSP <ffff88020efd3858>
May 21 15:12:05 ib2 kernel: [  415.063245] CR2: fffffffffffffff8
May 21 15:12:05 ib2 kernel: [  415.063285] ---[ end trace
871425e942ec1143 ]---
May 21 15:12:05 ib2 kernel: [  415.063326] Fixing recursive fault but
reboot is needed!
May 21 15:12:05 ib2 kernel: [  417.441382] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:07 ib2 kernel: [  419.840353] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:10 ib2 kernel: [  422.198880] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:12 ib2 kernel: [  424.597641] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:14 ib2 kernel: [  426.956288] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:17 ib2 kernel: [  429.355047] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:19 ib2 kernel: [  431.753621] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:22 ib2 kernel: [  434.122390] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:24 ib2 kernel: [  436.521068] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:26 ib2 kernel: [  436.660137] ------------[ cut here
]------------
May 21 15:12:26 ib2 kernel: [  436.660216] WARNING: at
kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
May 21 15:12:26 ib2 kernel: [  436.660272] Hardware name: System Product
Name
May 21 15:12:26 ib2 kernel: [  436.660313] Watchdog detected hard LOCKUP
on cpu 2
May 21 15:12:26 ib2 kernel: [  436.660341] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.662032] Pid: 3166, comm: kworker/2:0
Tainted: G      D    O 3.4.23-pserver-hotfix+ #109
May 21 15:12:26 ib2 kernel: [  436.662088] Call Trace:
May 21 15:12:26 ib2 kernel: [  436.662127]  <NMI>  [<ffffffff8103c2cf>]
warn_slowpath_common+0x7f/0xc0
May 21 15:12:26 ib2 kernel: [  436.662197]  [<ffffffff8103c3c6>]
warn_slowpath_fmt+0x46/0x50
May 21 15:12:26 ib2 kernel: [  436.662239]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.662283]  [<ffffffff810cd968>]
watchdog_overflow_callback+0x98/0xc0
May 21 15:12:26 ib2 kernel: [  436.662327]  [<ffffffff811077dc>]
__perf_event_overflow+0x9c/0x320
May 21 15:12:26 ib2 kernel: [  436.662370]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
May 21 15:12:26 ib2 kernel: [  436.662415]  [<ffffffff81108680>] ?
perf_event_mmap_ctx+0x170/0x170
May 21 15:12:26 ib2 kernel: [  436.662458]  [<ffffffff81107f74>]
perf_event_overflow+0x14/0x20
May 21 15:12:26 ib2 kernel: [  436.662501]  [<ffffffff81013f27>]
x86_pmu_handle_irq+0x1b7/0x220
May 21 15:12:26 ib2 kernel: [  436.662545]  [<ffffffff8173e341>]
perf_event_nmi_handler+0x21/0x30
May 21 15:12:26 ib2 kernel: [  436.662588]  [<ffffffff8173d8a6>]
nmi_handle+0xb6/0x200
May 21 15:12:26 ib2 kernel: [  436.662631]  [<ffffffff8173d7f0>] ?
oops_begin+0xd0/0xd0
May 21 15:12:26 ib2 kernel: [  436.662673]  [<ffffffff8173db1d>]
do_nmi+0x12d/0x350
May 21 15:12:26 ib2 kernel: [  436.662715]  [<ffffffff8173ceac>]
end_repeat_nmi+0x1a/0x1e
May 21 15:12:26 ib2 kernel: [  436.662758]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662800]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662842]  [<ffffffff81420d14>] ?
delay_tsc+0x34/0xb0
May 21 15:12:26 ib2 kernel: [  436.662883]  <<EOE>>
[<ffffffff81420c8f>] __delay+0xf/0x20
May 21 15:12:26 ib2 kernel: [  436.662952]  [<ffffffff814285a3>]
do_raw_spin_lock+0xd3/0x140
May 21 15:12:26 ib2 kernel: [  436.662995]  [<ffffffff8173bc74>]
_raw_spin_lock_irq+0x54/0x60
May 21 15:12:26 ib2 kernel: [  436.663037]  [<ffffffff8173a3e0>] ?
__schedule+0x120/0x940
May 21 15:12:26 ib2 kernel: [  436.663080]  [<ffffffff8173a3e0>]
__schedule+0x120/0x940
May 21 15:12:26 ib2 kernel: [  436.663122]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:26 ib2 kernel: [  436.663164]  [<ffffffff81042293>]
do_exit+0x7a3/0xa40
May 21 15:12:26 ib2 kernel: [  436.663206]  [<ffffffff8103e7fe>] ?
kmsg_dump+0x1be/0x300
May 21 15:12:26 ib2 kernel: [  436.663248]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:26 ib2 kernel: [  436.663291]  [<ffffffff817387f9>] ?
printk+0x41/0x48
May 21 15:12:26 ib2 kernel: [  436.663333]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:26 ib2 kernel: [  436.663376]  [<ffffffff8102f6bd>]
no_context+0x11d/0x2d0
May 21 15:12:26 ib2 kernel: [  436.663418]  [<ffffffff810afbf0>] ?
kallsyms_lookup+0x60/0xe0
May 21 15:12:26 ib2 kernel: [  436.663462]  [<ffffffff8102f9ad>]
__bad_area_nosemaphore+0x13d/0x220
May 21 15:12:26 ib2 kernel: [  436.663505]  [<ffffffff8102faa3>]
bad_area_nosemaphore+0x13/0x20
May 21 15:12:26 ib2 kernel: [  436.663548]  [<ffffffff81740603>]
do_page_fault+0x3a3/0x4e0
May 21 15:12:26 ib2 kernel: [  436.663590]  [<ffffffff8173cd06>] ?
error_sti+0x5/0x6
May 21 15:12:26 ib2 kernel: [  436.663632]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.663676]  [<ffffffff8142211d>] ?
trace_hardirqs_off_thunk+0x3a/0x3c
May 21 15:12:26 ib2 kernel: [  436.663719]  [<ffffffff8173cac5>]
page_fault+0x25/0x30
May 21 15:12:26 ib2 kernel: [  436.663762]  [<ffffffff81064700>] ?
kthread_data+0x10/0x20
May 21 15:12:26 ib2 kernel: [  436.663804]  [<ffffffff8105c735>]
wq_worker_sleeping+0x15/0xa0
May 21 15:12:26 ib2 kernel: [  436.663848]  [<ffffffff8173a963>]
__schedule+0x6a3/0x940
May 21 15:12:26 ib2 kernel: [  436.663890]  [<ffffffff8173acc9>]
schedule+0x29/0x70
May 21 15:12:26 ib2 kernel: [  436.663932]  [<ffffffff81042105>]
do_exit+0x615/0xa40
May 21 15:12:26 ib2 kernel: [  436.663974]  [<ffffffff8103e6c1>] ?
kmsg_dump+0x81/0x300
May 21 15:12:26 ib2 kernel: [  436.664017]  [<ffffffff8173d6db>]
oops_end+0xab/0xf0
May 21 15:12:26 ib2 kernel: [  436.664059]  [<ffffffff8100570b>]
die+0x5b/0x90
May 21 15:12:26 ib2 kernel: [  436.664102]  [<ffffffff8173d274>]
do_general_protection+0x164/0x170
May 21 15:12:26 ib2 kernel: [  436.664145]  [<ffffffff8173c8e0>] ?
restore_args+0x30/0x30
May 21 15:12:26 ib2 kernel: [  436.664188]  [<ffffffff8173ca95>]
general_protection+0x25/0x30
May 21 15:12:26 ib2 kernel: [  436.664233]  [<ffffffffa01c8bf9>] ?
ib_modify_qp+0x9/0x20 [ib_core]
May 21 15:12:26 ib2 kernel: [  436.664277]  [<ffffffffa02bfcb9>]
ipoib_cm_rep_handler+0x99/0x2c0 [ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.664321]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.664363]  [<ffffffff8173c557>] ?
_raw_spin_unlock_irqrestore+0x77/0x80
May 21 15:12:26 ib2 kernel: [  436.664407]  [<ffffffff8105c913>] ?
__queue_work+0x103/0x4a0
May 21 15:12:26 ib2 kernel: [  436.664450]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:26 ib2 kernel: [  436.664495]  [<ffffffffa02c0373>]
ipoib_cm_tx_handler+0x93/0x2b0 [ib_ipoib]
May 21 15:12:26 ib2 kernel: [  436.664538]  [<ffffffff8109c0bd>] ?
trace_hardirqs_off+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.664583]  [<ffffffffa0141cc5>]
cm_process_work+0x25/0x120 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664627]  [<ffffffffa0142508>]
cm_rep_handler+0x308/0x590 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664671]  [<ffffffffa0143c65>]
cm_work_handler+0x145/0x1070 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664714]  [<ffffffff8105daea>]
process_one_work+0x19a/0x5c0
May 21 15:12:26 ib2 kernel: [  436.664756]  [<ffffffff8105da7d>] ?
process_one_work+0x12d/0x5c0
May 21 15:12:26 ib2 kernel: [  436.664800]  [<ffffffffa0143b20>] ?
cm_req_handler+0xa40/0xa40 [ib_cm]
May 21 15:12:26 ib2 kernel: [  436.664843]  [<ffffffff8105f865>]
worker_thread+0x175/0x380
May 21 15:12:26 ib2 kernel: [  436.664886]  [<ffffffff8105f6f0>] ?
manage_workers+0x210/0x210
May 21 15:12:26 ib2 kernel: [  436.664929]  [<ffffffff81064e0e>]
kthread+0xbe/0xd0
May 21 15:12:26 ib2 kernel: [  436.664972]  [<ffffffff8109f2b0>] ?
trace_hardirqs_on_caller+0x20/0x1b0
May 21 15:12:26 ib2 kernel: [  436.665015]  [<ffffffff817465b4>]
kernel_thread_helper+0x4/0x10
May 21 15:12:26 ib2 kernel: [  436.665059]  [<ffffffff8173c4c0>] ?
_raw_spin_unlock_irq+0x30/0x50
May 21 15:12:26 ib2 kernel: [  436.665102]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:26 ib2 kernel: [  436.665145]  [<ffffffff8173c8b0>] ?
retint_restore_args+0x13/0x13
May 21 15:12:26 ib2 kernel: [  436.665187]  [<ffffffff81064d50>] ?
__init_kthread_worker+0x70/0x70
May 21 15:12:26 ib2 kernel: [  436.665231]  [<ffffffff817465b0>] ?
gs_change+0x13/0x13
May 21 15:12:26 ib2 kernel: [  436.665273] ---[ end trace
871425e942ec1144 ]---
May 21 15:12:26 ib2 kernel: [  438.919742] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:29 ib2 kernel: [  441.318429] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:31 ib2 kernel: [  443.717220] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:34 ib2 kernel: [  446.115789] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:36 ib2 kernel: [  448.514602] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:38 ib2 kernel: [  450.913390] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:41 ib2 kernel: [  453.271906] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:43 ib2 kernel: [  455.670796] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:46 ib2 kernel: [  458.069297] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:48 ib2 kernel: [  460.438309] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:50 ib2 kernel: [  462.836738] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:53 ib2 kernel: [  465.235553] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:55 ib2 kernel: [  467.634331] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:12:58 ib2 kernel: [  468.407807] ------------[ cut here
]------------
May 21 15:12:58 ib2 kernel: [  468.407897] WARNING: at
kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
May 21 15:12:58 ib2 kernel: [  468.407957] Hardware name: System Product
Name
May 21 15:12:58 ib2 kernel: [  468.408001] Watchdog detected hard LOCKUP
on cpu 1
May 21 15:12:58 ib2 kernel: [  468.408032] Modules linked in:
ib_ipoib(O) rdma_ucm rdma_cm iw_cm ib_addr ib_cm ib_sa ib_uverbs ib_umad
mlx4_ib ib_mad ib_core ip6table_filter ip6_tables iptable_filter
ip_tables ebtable_nat ebtables x_tables cpufreq_powersave
cpufreq_conservative cpufreq_stats cpufreq_userspace binfmt_misc fuse
loop kvm_amd kvm tpm_tis powernow_k8 shpchp tpm processor mperf
edac_core tpm_bios psmouse edac_mce_amd pci_hotplug microcode evdev
serio_raw i2c_piix4 asus_atk0110 thermal_sys button dm_multipath scsi_dh
mlx4_en sg sd_mod crc_t10dif r8169 ahci libahci libata scsi_mod
mlx4_core [last unloaded: ib_ipoib]
May 21 15:12:58 ib2 kernel: [  468.409806] Pid: 0, comm: swapper/1
Tainted: G      D W  O 3.4.23-pserver-hotfix+ #109
May 21 15:12:58 ib2 kernel: [  468.409866] Call Trace:
May 21 15:12:58 ib2 kernel: [  468.409908]  <NMI>  [<ffffffff8103c2cf>]
warn_slowpath_common+0x7f/0xc0
May 21 15:12:58 ib2 kernel: [  468.409986]  [<ffffffff8103c3c6>]
warn_slowpath_fmt+0x46/0x50
May 21 15:12:58 ib2 kernel: [  468.410033]  [<ffffffff8109c009>] ?
trace_hardirqs_off_caller+0x29/0xd0
May 21 15:12:58 ib2 kernel: [  468.410081]  [<ffffffff810cd968>]
watchdog_overflow_callback+0x98/0xc0
May 21 15:12:58 ib2 kernel: [  468.410129]  [<ffffffff811077dc>]
__perf_event_overflow+0x9c/0x320
May 21 15:12:58 ib2 kernel: [  468.410177]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
May 21 15:12:58 ib2 kernel: [  468.410225]  [<ffffffff81108680>] ?
perf_event_mmap_ctx+0x170/0x170
May 21 15:12:58 ib2 kernel: [  468.410272]  [<ffffffff81107f74>]
perf_event_overflow+0x14/0x20
May 21 15:12:58 ib2 kernel: [  468.410319]  [<ffffffff81013f27>]
x86_pmu_handle_irq+0x1b7/0x220
May 21 15:12:58 ib2 kernel: [  468.410368]  [<ffffffff8173e341>]
perf_event_nmi_handler+0x21/0x30
May 21 15:12:58 ib2 kernel: [  468.410416]  [<ffffffff8173d8a6>]
nmi_handle+0xb6/0x200
May 21 15:12:58 ib2 kernel: [  468.410462]  [<ffffffff8173d7f0>] ?
oops_begin+0xd0/0xd0
May 21 15:12:58 ib2 kernel: [  468.410508]  [<ffffffff8173db1d>]
do_nmi+0x12d/0x350
May 21 15:12:58 ib2 kernel: [  468.410554]  [<ffffffff8173ceac>]
end_repeat_nmi+0x1a/0x1e
May 21 15:12:58 ib2 kernel: [  468.410602]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410648]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410694]  [<ffffffff81420d41>] ?
delay_tsc+0x61/0xb0
May 21 15:12:58 ib2 kernel: [  468.410738]  <<EOE>>  <IRQ>
[<ffffffff81420c8f>] __delay+0xf/0x20
May 21 15:12:58 ib2 kernel: [  468.410839]  [<ffffffff814285a3>]
do_raw_spin_lock+0xd3/0x140
May 21 15:12:58 ib2 kernel: [  468.410885]  [<ffffffff8173bba8>]
_raw_spin_lock+0x48/0x50
May 21 15:12:58 ib2 kernel: [  468.410932]  [<ffffffff810834f2>] ?
sched_rt_period_timer+0xf2/0x270
May 21 15:12:58 ib2 kernel: [  468.410980]  [<ffffffff8173c58b>] ?
_raw_spin_unlock+0x2b/0x50
May 21 15:12:58 ib2 kernel: [  468.411027]  [<ffffffff810834f2>]
sched_rt_period_timer+0xf2/0x270
May 21 15:12:58 ib2 kernel: [  468.411075]  [<ffffffff81069ff6>]
__run_hrtimer+0x86/0x2f0
May 21 15:12:58 ib2 kernel: [  468.411121]  [<ffffffff81083400>] ?
init_rt_bandwidth+0x60/0x60
May 21 15:12:58 ib2 kernel: [  468.411168]  [<ffffffff8106a50e>]
hrtimer_interrupt+0xfe/0x270
May 21 15:12:58 ib2 kernel: [  468.411215]  [<ffffffff81746ea9>]
smp_apic_timer_interrupt+0x69/0x99
May 21 15:12:58 ib2 kernel: [  468.411263]  [<ffffffff81745caf>]
apic_timer_interrupt+0x6f/0x80
May 21 15:12:58 ib2 kernel: [  468.411308]  <EOI>  [<ffffffff8100bab1>]
? default_idle+0x61/0x320
May 21 15:12:58 ib2 kernel: [  468.411383]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:58 ib2 kernel: [  468.411431]  [<ffffffff8102b3d6>] ?
native_safe_halt+0x6/0x10
May 21 15:12:58 ib2 kernel: [  468.411477]  [<ffffffff8109f44d>] ?
trace_hardirqs_on+0xd/0x10
May 21 15:12:58 ib2 kernel: [  468.411523]  [<ffffffff8100bab6>]
default_idle+0x66/0x320
May 21 15:12:58 ib2 kernel: [  468.411569]  [<ffffffff8100be02>]
amd_e400_idle+0x92/0x130
May 21 15:12:58 ib2 kernel: [  468.411617]  [<ffffffff8100af36>]
cpu_idle+0xf6/0x140
May 21 15:12:58 ib2 kernel: [  468.411664]  [<ffffffff81731d77>]
start_secondary+0x1ed/0x1f4
May 21 15:12:58 ib2 kernel: [  468.411709] ---[ end trace
871425e942ec1145 ]---
May 21 15:12:58 ib2 kernel: [  470.032848] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:00 ib2 kernel: [  472.431601] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:02 ib2 kernel: [  474.830297] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:05 ib2 kernel: [  477.229094] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:07 ib2 kernel: [  479.627563] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:10 ib2 kernel: [  482.026253] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:12 ib2 kernel: [  484.395049] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:14 ib2 kernel: [  486.793758] ib0: enabling connected mode
will cause multicast packet drops
May 21 15:13:17 ib2 kernel: [  489.192468] ib0: enabling connected mode
will cause multicast packet drops


[  884.055635] general protection fault: 0000 [#1] SMP
[  884.055780] CPU 0
[  884.055821] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  884.058726]
[  884.058788] Pid: 3001, comm: kworker/0:0 Tainted: G           O
3.4.23-pserver-hotfix+ #111 System manufacturer System Product
Name/M4A89GTD-PRO
[  884.059827] RIP: 0010:[<ffffffffa02dc3e0>]  [<ffffffffa02dc3e0>]
ipoib_cm_tx_handler+0x30/0x2b0 [ib_ipoib]
[  884.059952] RSP: 0018:ffff8801fad67c50  EFLAGS: 00010293
[  884.060015] RAX: ffff8801fad67fd8 RBX: ffff880211ed5d88 RCX:
0000000000000006
[  884.060080] RDX: 0000000000000003 RSI: ffff8801f664c0d8 RDI:
ffff880211ed5d88
[  884.060139] RBP: ffff8801fad67ca0 R08: 0000000000000001 R09:
0000000000000002
[  884.060198] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8801f664c000
[  884.060257] R13: ffff88020d110b98 R14: 6b6b6b6b6b6b756b R15:
ffff8801f664c0d8
[  884.060316] FS:  00007f11da415700(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[  884.060390] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  884.060449] CR2: 00007f11d032c000 CR3: 00000001f16f5000 CR4:
00000000000007f0
[  884.060512] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  884.060579] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  884.060643] Process kworker/0:0 (pid: 3001, threadinfo
ffff8801fad66000, task ffff8801fb734180)
[  884.060717] Stack:
[  884.060777]  ffff8801fad67ca0 ffffffff8109f019 ffff8801fad67c70
ffffffff8109c0bd
[  884.061014]  ffff8801fad67c90 ffff880211ed5d88 ffff8801f664c000
ffff8801f664c000
[  884.061248]  ffff88020c031100 ffff8801fad67dc0 ffff8801fad67cf0
ffffffffa017fcc5
[  884.061486] Call Trace:
[  884.061544]  [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120
[  884.061610]  [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10
[  884.061673]  [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm]
[  884.061734]  [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm]
[  884.061798]  [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm]
[  884.061867]  [<ffffffff8105daea>] process_one_work+0x19a/0x5c0
[  884.061929]  [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0
[  884.061990]  [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm]
[  884.062055]  [<ffffffff8105f865>] worker_thread+0x175/0x380
[  884.062116]  [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210
[  884.062176]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  884.062239]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  884.062302]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  884.062792]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  884.062853]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  884.062914]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  884.062974] Code: 57 41 56 41 55 41 54 53 48 83 ec 28 66 66 66 66 90
4c 8b 6f 08 8b 16 48 89 fb 49 89 f7 4d 8b 75 20 49 81 c6 00 0a 00 00 83
fa 0b <4d> 8b 66 38 77 2a 89 d0 ff 24 c5 90 08 2e a0 90 44 8b 1d a1 79
[  884.066632] RIP  [<ffffffffa02dc3e0>] ipoib_cm_tx_handler+0x30/0x2b0
[ib_ipoib]
[  884.066770]  RSP <ffff8801fad67c50>
[  884.066841] ---[ end trace fa3d54b0aa9bc9ce ]---
(gdb) list *ipoib_cm_tx_handler+0x30
0xa410 is in ipoib_cm_tx_handler
(drivers/infiniband/ulp/ipoib/ipoib_cm.c:1208).
1203    static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
1204                                   struct ib_cm_event *event)
1205    {
1206            struct ipoib_cm_tx *tx = cm_id->context;
1207            struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
1208            struct net_device *dev = priv->dev;
1209            struct ipoib_neigh *neigh;
1210            unsigned long flags;
1211            int ret;
1212    



[  884.066926] BUG: unable to handle kernel paging request at
fffffffffffffff8
[  884.067090] IP: [<ffffffff81064700>] kthread_data+0x10/0x20
[  884.067210] PGD 1c0d067 PUD 1c0e067 PMD 0
[  884.067412] Oops: 0000 [#2] SMP
[  884.067565] CPU 0
[  884.067618] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  884.071695]
[  884.071753] Pid: 3001, comm: kworker/0:0 Tainted: G      D    O
3.4.23-pserver-hotfix+ #111 System manufacturer System Product
Name/M4A89GTD-PRO
[  884.071972] RIP: 0010:[<ffffffff81064700>]  [<ffffffff81064700>]
kthread_data+0x10/0x20
[  884.072099] RSP: 0018:ffff8801fad679a8  EFLAGS: 00010096
[  884.072168] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[  884.072228] RDX: ffffffff81e138c0 RSI: 0000000000000000 RDI:
ffff8801fb734180
[  884.072293] RBP: ffff8801fad679a8 R08: ffff8801fb7341f0 R09:
000000cdd60f50a3
[  884.072357] R10: 0000000000000c00 R11: 0000000000000000 R12:
0000000000000000
[  884.072422] R13: ffff8801fb734548 R14: ffff8801fad677d8 R15:
ffff8801f664c0d8
[  884.072485] FS:  00007f11da415700(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[  884.072560] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  884.072623] CR2: fffffffffffffff8 CR3: 00000001f16f5000 CR4:
00000000000007f0
[  884.072690] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  884.072762] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  884.072827] Process kworker/0:0 (pid: 3001, threadinfo
ffff8801fad66000, task ffff8801fb734180)
[  884.072909] Stack:
[  884.072969]  ffff8801fad679c8 ffffffff8105c735 ffff8801fad679c8
ffff88021fc12f40
[  884.074211]  ffff8801fad67a58 ffffffff8173aad3 ffff880100000000
ffff8801fad66000
[  884.074481]  ffff8801fad67fd8 ffff8801fad66000 ffff8801fad66010
ffff8801fad66000
[  884.074742] Call Trace:
[  884.074801]  [<ffffffff8105c735>] wq_worker_sleeping+0x15/0xa0
[  884.074869]  [<ffffffff8173aad3>] __schedule+0x6a3/0x940
[  884.074934]  [<ffffffff8173ae39>] schedule+0x29/0x70
[  884.074998]  [<ffffffff81042105>] do_exit+0x615/0xa40
[  884.075061]  [<ffffffff8103e6c1>] ? kmsg_dump+0x81/0x300
[  884.075123]  [<ffffffff8173d85b>] oops_end+0xab/0xf0
[  884.075184]  [<ffffffff8100570b>] die+0x5b/0x90
[  884.075245]  [<ffffffff8173d3f4>] do_general_protection+0x164/0x170
[  884.075308]  [<ffffffff8173ca60>] ? restore_args+0x30/0x30
[  884.075370]  [<ffffffff8173cc15>] general_protection+0x25/0x30
[  884.075434]  [<ffffffffa02dc3e0>] ? ipoib_cm_tx_handler+0x30/0x2b0
[ib_ipoib]
[  884.075498]  [<ffffffff8109f019>] ? mark_held_locks+0x79/0x120
[  884.075559]  [<ffffffff8109c0bd>] ? trace_hardirqs_off+0xd/0x10
[  884.075622]  [<ffffffffa017fcc5>] cm_process_work+0x25/0x120 [ib_cm]
[  884.075686]  [<ffffffffa0180508>] cm_rep_handler+0x308/0x590 [ib_cm]
[  884.075750]  [<ffffffffa0181c65>] cm_work_handler+0x145/0x1070 [ib_cm]
[  884.075813]  [<ffffffff8105daea>] process_one_work+0x19a/0x5c0
[  884.075875]  [<ffffffff8105da7d>] ? process_one_work+0x12d/0x5c0
[  884.075938]  [<ffffffffa0181b20>] ? cm_req_handler+0xa40/0xa40 [ib_cm]
[  884.076001]  [<ffffffff8105f865>] worker_thread+0x175/0x380
[  884.076064]  [<ffffffff8105f6f0>] ? manage_workers+0x210/0x210
[  884.076126]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  884.076187]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  884.076252]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  884.076313]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  884.076376]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  884.076438]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  884.076499] Code: 66 66 66 90 65 48 8b 04 25 80 b9 00 00 48 8b 80 70
03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03
00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
[  884.081230] RIP  [<ffffffff81064700>] kthread_data+0x10/0x20
[  884.081332]  RSP <ffff8801fad679a8>
[  884.081388] CR2: fffffffffffffff8
[  884.081447] ---[ end trace fa3d54b0aa9bc9cf ]---
[  884.081504] Fixing recursive fault but reboot is needed!
[  903.845688] ------------[ cut here ]------------
[  903.845800] WARNING: at kernel/watchdog.c:241
watchdog_overflow_callback+0x98/0xc0()
[  903.845878] Hardware name: System Product Name
[  903.845939] Watchdog detected hard LOCKUP on cpu 3
[  903.845989] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  903.850712] Pid: 19, comm: ksoftirqd/3 Tainted: G      D    O
3.4.23-pserver-hotfix+ #111
[  903.850790] Call Trace:
[  903.850851]  <NMI>  [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0
[  903.850967]  [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50
[  903.851034]  [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0
[  903.851101]  [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0
[  903.851167]  [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320
[  903.851233]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
[  903.851299]  [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170
[  903.852535]  [<ffffffff81107f74>] perf_event_overflow+0x14/0x20
[  903.852601]  [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220
[  903.852668]  [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30
[  903.852733]  [<ffffffff8173da26>] nmi_handle+0xb6/0x200
[  903.852798]  [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0
[  903.852863]  [<ffffffff8173dc9d>] do_nmi+0x12d/0x350
[  903.852928]  [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e
[  903.852994]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853059]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853123]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  903.853188]  <<EOE>>  [<ffffffff81420dff>] __delay+0xf/0x20
[  903.853302]  [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140
[  903.853367]  [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50
[  903.853433]  [<ffffffff8107771f>] ? try_to_wake_up+0x20f/0x2f0
[  903.853498]  [<ffffffff8107771f>] try_to_wake_up+0x20f/0x2f0
[  903.853564]  [<ffffffff81077812>] default_wake_function+0x12/0x20
[  903.853629]  [<ffffffff810654cd>] autoremove_wake_function+0x1d/0x50
[  903.853694]  [<ffffffff8106e729>] __wake_up_common+0x59/0x90
[  903.853759]  [<ffffffff81071310>] __wake_up+0x40/0x60
[  903.853827]  [<ffffffff815cc82c>] sk_stream_write_space+0xdc/0x230
[  903.853892]  [<ffffffff815cc794>] ? sk_stream_write_space+0x44/0x230
[  903.853958]  [<ffffffff81629760>] tcp_data_snd_check+0x110/0x120
[  903.854023]  [<ffffffff8162e829>] tcp_rcv_established+0x389/0x870
[  903.854089]  [<ffffffff81639a17>] tcp_v4_do_rcv+0x297/0x5d0
[  903.854153]  [<ffffffff8163a2f1>] tcp_v4_rcv+0x5a1/0x930
[  903.854217]  [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0
[  903.854283]  [<ffffffff81611ee5>] ip_local_deliver_finish+0x135/0x4f0
[  903.854348]  [<ffffffff81611dfc>] ? ip_local_deliver_finish+0x4c/0x4f0
[  903.854413]  [<ffffffff81611da0>] ip_local_deliver+0x80/0x90
[  903.854478]  [<ffffffff8161244d>] ip_rcv_finish+0x1ad/0x660
[  903.854544]  [<ffffffff81611c58>] ip_rcv+0x228/0x2f0
[  903.854610]  [<ffffffff815d7696>] __netif_receive_skb+0x2c6/0x990
[  903.854675]  [<ffffffff815d74e6>] ? __netif_receive_skb+0x116/0x990
[  903.854741]  [<ffffffff81162487>] ?
__kmalloc_node_track_caller+0xf7/0x250
[  903.854807]  [<ffffffff815d89bd>] netif_receive_skb+0x2d/0x210
[  903.854877]  [<ffffffffa02de26a>] ipoib_cm_handle_rx_wc+0x1fa/0x710
[ib_ipoib]
[  903.854958]  [<ffffffff8173c6fb>] ? _raw_spin_unlock+0x2b/0x50
[  903.855026]  [<ffffffffa02ded32>] ? ipoib_cm_handle_tx_wc+0x1c2/0x370
[ib_ipoib]
[  903.855108]  [<ffffffffa02d7a86>] ipoib_poll+0xd6/0x190 [ib_ipoib]
[  903.855173]  [<ffffffff815d97ad>] net_rx_action+0x13d/0x320
[  903.855239]  [<ffffffff81045048>] __do_softirq+0xf8/0x380
[  903.855304]  [<ffffffff810453ed>] run_ksoftirqd+0x11d/0x1e0
[  903.855368]  [<ffffffff810452d0>] ? __do_softirq+0x380/0x380
[  903.855433]  [<ffffffff81064e0e>] kthread+0xbe/0xd0
[  903.855497]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  903.855564]  [<ffffffff81746734>] kernel_thread_helper+0x4/0x10
[  903.856798]  [<ffffffff8173ca30>] ? retint_restore_args+0x13/0x13
[  903.856864]  [<ffffffff81064d50>] ? __init_kthread_worker+0x70/0x70
[  903.856929]  [<ffffffff81746730>] ? gs_change+0x13/0x13
[  903.856993] ---[ end trace fa3d54b0aa9bc9d0 ]---
[  917.505825] ------------[ cut here ]------------
[  917.505938] WARNING: at kernel/watchdog.c:241
watchdog_overflow_callback+0x98/0xc0()
[  917.506014] Hardware name: System Product Name
[  917.506075] Watchdog detected hard LOCKUP on cpu 2
[  917.506123] Modules linked in: ib_ipoib(O) rdma_ucm rdma_cm iw_cm
ib_addr ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats
cpufreq_userspace binfmt_misc fuse loop kvm_amd kvm powernow_k8 psmouse
serio_raw mperf microcode tpm_tis tpm evdev edac_core tpm_bios
edac_mce_amd asus_atk0110 shpchp i2c_piix4 pci_hotplug processor
thermal_sys button dm_multipath scsi_dh mlx4_en sg sd_mod crc_t10dif
mlx4_core r8169 ahci libahci libata scsi_mod [last unloaded: ib_ipoib]
[  917.510288] Pid: 3337, comm: iperf Tainted: G      D W  O
3.4.23-pserver-hotfix+ #111
[  917.510362] Call Trace:
[  917.510421]  <NMI>  [<ffffffff8103c2cf>] warn_slowpath_common+0x7f/0xc0
[  917.510534]  [<ffffffff8103c3c6>] warn_slowpath_fmt+0x46/0x50
[  917.510598]  [<ffffffff8109c009>] ? trace_hardirqs_off_caller+0x29/0xd0
[  917.510662]  [<ffffffff810cd968>] watchdog_overflow_callback+0x98/0xc0
[  917.511154]  [<ffffffff811077dc>] __perf_event_overflow+0x9c/0x320
[  917.511218]  [<ffffffff811087ec>] ?
perf_event_update_userpage+0x16c/0x2c0
[  917.511283]  [<ffffffff81108680>] ? perf_event_mmap_ctx+0x170/0x170
[  917.511347]  [<ffffffff81107f74>] perf_event_overflow+0x14/0x20
[  917.511411]  [<ffffffff81013f27>] x86_pmu_handle_irq+0x1b7/0x220
[  917.511477]  [<ffffffff8173e4c1>] perf_event_nmi_handler+0x21/0x30
[  917.511541]  [<ffffffff8173da26>] nmi_handle+0xb6/0x200
[  917.511604]  [<ffffffff8173d970>] ? oops_begin+0xd0/0xd0
[  917.511669]  [<ffffffff8173dc9d>] do_nmi+0x12d/0x350
[  917.511732]  [<ffffffff8173d02c>] end_repeat_nmi+0x1a/0x1e
[  917.511796]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511859]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511921]  [<ffffffff81420eb1>] ? delay_tsc+0x61/0xb0
[  917.511984]  <<EOE>>  [<ffffffff81420dff>] __delay+0xf/0x20
[  917.512093]  [<ffffffff81428713>] do_raw_spin_lock+0xd3/0x140
[  917.512158]  [<ffffffff8173bd18>] _raw_spin_lock+0x48/0x50
[  917.513308]  [<ffffffff8107eee0>] ? load_balance+0x540/0x8a0
[  917.513371]  [<ffffffff8107eee0>] load_balance+0x540/0x8a0
[  917.513435]  [<ffffffff8107eefc>] ? load_balance+0x55c/0x8a0
[  917.513498]  [<ffffffff8107fe8d>] idle_balance+0x13d/0x2b0
[  917.513560]  [<ffffffff8107fda0>] ? idle_balance+0x50/0x2b0
[  917.513623]  [<ffffffff8173acc0>] __schedule+0x890/0x940
[  917.513686]  [<ffffffff8173ae39>] schedule+0x29/0x70
[  917.513749]  [<ffffffff81738bd5>] schedule_timeout+0x225/0x3b0
[  917.513812]  [<ffffffff8109f2b0>] ? trace_hardirqs_on_caller+0x20/0x1b0
[  917.513877]  [<ffffffff815c26ae>] ? release_sock+0x14e/0x1b0
[  917.513939]  [<ffffffff8109f44d>] ? trace_hardirqs_on+0xd/0x10
[  917.514003]  [<ffffffff81045542>] ? local_bh_enable_ip+0x92/0xf0
[  917.514067]  [<ffffffff8173c5f3>] ? _raw_spin_unlock_bh+0x43/0x50
[  917.514132]  [<ffffffff815ccf98>] sk_stream_wait_memory+0x218/0x300
[  917.514196]  [<ffffffff810654b0>] ? wake_up_bit+0x40/0x40
[  917.514260]  [<ffffffff816247d1>] tcp_sendmsg+0x681/0xc30
[  917.514324]  [<ffffffff8164e0db>] inet_sendmsg+0x12b/0x240
[  917.514387]  [<ffffffff8164dfb0>] ? inet_create+0x5b0/0x5b0
[  917.514450]  [<ffffffff815c27c2>] ? sock_update_classid+0xb2/0x2b0
[  917.514514]  [<ffffffff815c2860>] ? sock_update_classid+0x150/0x2b0
[  917.514577]  [<ffffffff815bdf90>] sock_aio_write+0x190/0x1b0
[  917.514641]  [<ffffffff8113924f>] ? handle_pte_fault+0x50f/0x8e0
[  917.514706]  [<ffffffff8116e11a>] do_sync_write+0xea/0x130
[  917.514770]  [<ffffffff81170cc3>] ? fget_light+0x43/0x490
[  917.514835]  [<ffffffff813b1013>] ? security_file_permission+0x23/0x90
[  917.514900]  [<ffffffff8116e772>] vfs_write+0x172/0x190
[  917.514965]  [<ffffffff8116e881>] sys_write+0x51/0x90
[  917.515028]  [<ffffffff817452e9>] system_call_fastpath+0x16/0x1b
[  917.515092] ---[ end trace fa3d54b0aa9bc9d1 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to