Hi,
I was wondering if anyone has seen or has a patch for this oops? We
just switched to the suse 2.6.16 kernel, lustre 1.4.9, with OFED-1.1.
We're using the mthca device. Part of the lnet debug log is below (note
that the line numbers are 3 greater than what's in the lustre 1.4.9 source).
thanks,
pau
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<0000000000000000>{stext+2130702568}
PGD 23568b067 PUD 2359d6067 PMD 0
Oops: 0010 [1] SMP
last sysfs file: /devices/pci0000:80/0000:80:0e.0/0000:81:00.0/modalias
CPU 0
Modules linked in: fsfilt_ldiskfs ldiskfs mds osc ksocklnd llite lov
lquota mdc ptlrpc ib_srp ib_ipoib ib_uverbs ib_umad ko2iblnd rdma_cm
ib_addr ib_cm ib_sa obdclass lnet lvfs libcfs i2c_nforce2 ib_mthca
ib_mad ib_core e1000 qla2322 qla2xxx sata_nv libata
Pid: 3066, comm: rdma_cm_wq Tainted: G U
2.6.16-27-0.6_lustre.1.4.9custom-default #14
RIP: 0010:[<0000000000000000>] <0000000000000000>{stext+2130702568}
RSP: 0000:ffff8102384abd00 EFLAGS: 00010293
RAX: ffff81023e92d000 RBX: ffff8102388f3618 RCX: ffff8102384abb00
RDX: ffff810234a0ce80 RSI: 0000000000000001 RDI: ffff810234a0ce80
RBP: ffff810234a0ce80 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff810237dcf480 R15: 0000000000000010
FS: 00002b3288172280(0000) GS:ffffffff8153d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000002359f3000 CR4: 00000000000006e0
Process rdma_cm_wq (pid: 3066, threadinfo ffff8102384aa000, task
ffff81023f09e100)
Stack: ffffffff881ed319 0000000000000000 0000000000000000 ffffffff881fb6c4
ffff81020000000f ffff810234a0a000 0000000234a0a000 0000000234a0a000
0000000000000680 ffff810234d512c0
Call Trace: <ffffffff881ed319>{:ko2iblnd:kiblnd_create_conn+2889}
<ffffffff881e40a0>{:rdma_cm:cma_work_handler+0}
<ffffffff881f6364>{:ko2iblnd:kiblnd_active_connect+52}
<ffffffff881e40a0>{:rdma_cm:cma_work_handler+0}
<ffffffff881f6c0b>{:ko2iblnd:kiblnd_cm_callback+1387}
<ffffffff881e40df>{:rdma_cm:cma_work_handler+63}
<ffffffff8103f010>{run_workqueue+176}
<ffffffff81042c60>{keventd_create_kthread+0}
<ffffffff8103f1aa>{worker_thread+330}
<ffffffff81027a00>{default_wake_function+0}
<ffffffff81027a00>{default_wake_function+0}
<ffffffff8103f060>{worker_thread+0} <ffffffff81042c19>{kthread+217}
<ffffffff8100bba6>{child_rip+8}
<ffffffff81042c60>{keventd_create_kthread+0}
<ffffffff81042b40>{kthread+0} <ffffffff8100bb9e>{child_rip+0}
Code: Bad RIP value.
RIP <0000000000000000>{stext+2130702568} RSP <ffff8102384abd00>
CR2: 0000000000000000
----------
Debug Log
----------
0800:00000200:0:1175715859.672445:0:3066:0:(o2iblnd_cb.c:2622:kiblnd_cm_callback())
[EMAIL PROTECTED] Route resolved: 0
0800:00000200:0:1175715859.672448:0:3066:0:(o2iblnd.c:620:kiblnd_create_conn())
params ffff81023495b9c0 peer, ffff81023495d200 cmid
0800:00000010:0:1175715859.672452:0:3066:0:(o2iblnd.c:622:kiblnd_create_conn())
kmalloced 'init_qp_attr': 72 at ffff810234d512c0 (tot
0800:00000010:0:1175715859.672457:0:3066:0:(o2iblnd.c:629:kiblnd_create_conn())
kmalloced 'conn': 216 at ffff810237dcf480 (tot 9066422
0800:00000010:0:1175715859.672461:0:3066:0:(o2iblnd.c:650:kiblnd_create_conn())
kmalloced 'conn->ibc_connvars': 136 at ffff8102349f5b4
0800:00000010:0:1175715859.672466:0:3066:0:(o2iblnd.c:657:kiblnd_create_conn())
kmalloced 'conn->ibc_rxs': 1664 at ffff8102388f3000 (t
0800:00000010:0:1175715859.672470:0:3066:0:(o2iblnd.c:1050:kiblnd_alloc_pages())
kmalloced 'p': 136 at ffff8102349f5a80 (tot 9068358).
0800:00000200:0:1175715859.672476:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 0: ffff8102349fb000 0x2349fb000(0x2349fb000)
0800:00000200:0:1175715859.672480:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 1: ffff8102349fc000 0x2349fc000(0x2349fc000)
0800:00000200:0:1175715859.672485:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 2: ffff8102349fd000 0x2349fd000(0x2349fd000)
0800:00000200:0:1175715859.672489:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 3: ffff8102349fe000 0x2349fe000(0x2349fe000)
0800:00000200:0:1175715859.672494:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 4: ffff8102349ff000 0x2349ff000(0x2349ff000)
0800:00000200:0:1175715859.672497:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 5: ffff810234a00000 0x234a00000(0x234a00000)
0800:00000200:0:1175715859.672502:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 6: ffff810234a01000 0x234a01000(0x234a01000)
0800:00000200:0:1175715859.672506:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 7: ffff810234a02000 0x234a02000(0x234a02000)
0800:00000200:0:1175715859.672510:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 8: ffff810234a03000 0x234a03000(0x234a03000)
0800:00000200:0:1175715859.672514:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 9: ffff810234a04000 0x234a04000(0x234a04000)
0800:00000200:0:1175715859.672517:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 10: ffff810234a05000 0x234a05000(0x234a05000)
0800:00000200:0:1175715859.672521:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 11: ffff810234a06000 0x234a06000(0x234a06000)
0800:00000200:0:1175715859.672524:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 12: ffff810234a07000 0x234a07000(0x234a07000)
0800:00000200:0:1175715859.672529:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 13: ffff810234a08000 0x234a08000(0x234a08000)
0800:00000200:0:1175715859.672534:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 14: ffff810234a09000 0x234a09000(0x234a09000)
0800:00000200:0:1175715859.672537:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn())
rx 15: ffff810234a0a000 0x234a0a000(0x234a0a000)
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel