Hi,
I was wondering if anyone has seen or has a patch for this oops? We just switched to the suse 2.6.16 kernel, lustre 1.4.9, with OFED-1.1. We're using the mthca device. Part of the lnet debug log is below (note that the line numbers are 3 greater than what's in the lustre 1.4.9 source).
thanks,
pau

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<0000000000000000>{stext+2130702568}
PGD 23568b067 PUD 2359d6067 PMD 0
Oops: 0010 [1] SMP
last sysfs file: /devices/pci0000:80/0000:80:0e.0/0000:81:00.0/modalias
CPU 0
Modules linked in: fsfilt_ldiskfs ldiskfs mds osc ksocklnd llite lov lquota mdc ptlrpc ib_srp ib_ipoib ib_uverbs ib_umad ko2iblnd rdma_cm ib_addr ib_cm ib_sa obdclass lnet lvfs libcfs i2c_nforce2 ib_mthca ib_mad ib_core e1000 qla2322 qla2xxx sata_nv libata Pid: 3066, comm: rdma_cm_wq Tainted: G U 2.6.16-27-0.6_lustre.1.4.9custom-default #14
RIP: 0010:[<0000000000000000>] <0000000000000000>{stext+2130702568}
RSP: 0000:ffff8102384abd00  EFLAGS: 00010293
RAX: ffff81023e92d000 RBX: ffff8102388f3618 RCX: ffff8102384abb00
RDX: ffff810234a0ce80 RSI: 0000000000000001 RDI: ffff810234a0ce80
RBP: ffff810234a0ce80 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff810237dcf480 R15: 0000000000000010
FS:  00002b3288172280(0000) GS:ffffffff8153d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000002359f3000 CR4: 00000000000006e0
Process rdma_cm_wq (pid: 3066, threadinfo ffff8102384aa000, task ffff81023f09e100)
Stack: ffffffff881ed319 0000000000000000 0000000000000000 ffffffff881fb6c4
      ffff81020000000f ffff810234a0a000 0000000234a0a000 0000000234a0a000
      0000000000000680 ffff810234d512c0
Call Trace: <ffffffff881ed319>{:ko2iblnd:kiblnd_create_conn+2889}
<ffffffff881e40a0>{:rdma_cm:cma_work_handler+0} <ffffffff881f6364>{:ko2iblnd:kiblnd_active_connect+52} <ffffffff881e40a0>{:rdma_cm:cma_work_handler+0} <ffffffff881f6c0b>{:ko2iblnd:kiblnd_cm_callback+1387} <ffffffff881e40df>{:rdma_cm:cma_work_handler+63} <ffffffff8103f010>{run_workqueue+176} <ffffffff81042c60>{keventd_create_kthread+0} <ffffffff8103f1aa>{worker_thread+330} <ffffffff81027a00>{default_wake_function+0} <ffffffff81027a00>{default_wake_function+0}
      <ffffffff8103f060>{worker_thread+0} <ffffffff81042c19>{kthread+217}
<ffffffff8100bba6>{child_rip+8} <ffffffff81042c60>{keventd_create_kthread+0}
      <ffffffff81042b40>{kthread+0} <ffffffff8100bb9e>{child_rip+0}

Code:  Bad RIP value.
RIP <0000000000000000>{stext+2130702568} RSP <ffff8102384abd00>
CR2: 0000000000000000

----------
Debug Log
----------
0800:00000200:0:1175715859.672445:0:3066:0:(o2iblnd_cb.c:2622:kiblnd_cm_callback()) [EMAIL PROTECTED] Route resolved: 0 0800:00000200:0:1175715859.672448:0:3066:0:(o2iblnd.c:620:kiblnd_create_conn()) params ffff81023495b9c0 peer, ffff81023495d200 cmid 0800:00000010:0:1175715859.672452:0:3066:0:(o2iblnd.c:622:kiblnd_create_conn()) kmalloced 'init_qp_attr': 72 at ffff810234d512c0 (tot 0800:00000010:0:1175715859.672457:0:3066:0:(o2iblnd.c:629:kiblnd_create_conn()) kmalloced 'conn': 216 at ffff810237dcf480 (tot 9066422 0800:00000010:0:1175715859.672461:0:3066:0:(o2iblnd.c:650:kiblnd_create_conn()) kmalloced 'conn->ibc_connvars': 136 at ffff8102349f5b4 0800:00000010:0:1175715859.672466:0:3066:0:(o2iblnd.c:657:kiblnd_create_conn()) kmalloced 'conn->ibc_rxs': 1664 at ffff8102388f3000 (t 0800:00000010:0:1175715859.672470:0:3066:0:(o2iblnd.c:1050:kiblnd_alloc_pages()) kmalloced 'p': 136 at ffff8102349f5a80 (tot 9068358). 0800:00000200:0:1175715859.672476:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 0: ffff8102349fb000 0x2349fb000(0x2349fb000) 0800:00000200:0:1175715859.672480:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 1: ffff8102349fc000 0x2349fc000(0x2349fc000) 0800:00000200:0:1175715859.672485:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 2: ffff8102349fd000 0x2349fd000(0x2349fd000) 0800:00000200:0:1175715859.672489:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 3: ffff8102349fe000 0x2349fe000(0x2349fe000) 0800:00000200:0:1175715859.672494:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 4: ffff8102349ff000 0x2349ff000(0x2349ff000) 0800:00000200:0:1175715859.672497:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 5: ffff810234a00000 0x234a00000(0x234a00000) 0800:00000200:0:1175715859.672502:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 6: ffff810234a01000 0x234a01000(0x234a01000) 0800:00000200:0:1175715859.672506:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 7: ffff810234a02000 0x234a02000(0x234a02000) 0800:00000200:0:1175715859.672510:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 8: ffff810234a03000 0x234a03000(0x234a03000) 0800:00000200:0:1175715859.672514:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 9: ffff810234a04000 0x234a04000(0x234a04000) 0800:00000200:0:1175715859.672517:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 10: ffff810234a05000 0x234a05000(0x234a05000) 0800:00000200:0:1175715859.672521:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 11: ffff810234a06000 0x234a06000(0x234a06000) 0800:00000200:0:1175715859.672524:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 12: ffff810234a07000 0x234a07000(0x234a07000) 0800:00000200:0:1175715859.672529:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 13: ffff810234a08000 0x234a08000(0x234a08000) 0800:00000200:0:1175715859.672534:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 14: ffff810234a09000 0x234a09000(0x234a09000) 0800:00000200:0:1175715859.672537:0:3066:0:(o2iblnd.c:683:kiblnd_create_conn()) rx 15: ffff810234a0a000 0x234a0a000(0x234a0a000)


_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to