Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11176



Hit this again after the update to:
2007-01-12 16:35:12 Starting lustre: Lustre: OBD class driver Build Version:
1.4.8.0_-19691231160000-PRISTINE-.lib.modules.2.6.9-58chaos.build-2.6.9-58chaos,
[EMAIL PROTECTED]

Here's the console log (igs1):
2007-01-13 03:05:03 NMI Watchdog detected LOCKUP, CPU=1, registers:
2007-01-13 03:05:03 CPU 1 
2007-01-13 03:05:03 Modules linked in: lquota(U) obdfilter(U) fsfilt_ldiskfs(U)
ldiskfs(U) jbd(U) ost(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U)
libcfs(U) perfctr(U) i2c_i801(U) netdump(U) e752x_edac(U) edac_mc(U) i2c_dev(U)
i2c_core(U) dm_mod(U) rtc(U) md(U) sd_mod(U) uhci_hcd(U) ehci_hcd(U) floppy(U)
qla2300(U) qla2xxx(U) scsi_transport_fc(U) scsi_mod(U) unionfs(U) nfs(U)
lockd(U) sunrpc(U) e1000(U)
2007-01-13 03:05:03 Pid: 4528, comm: socknal_sd01 Not tainted 2.6.9-58chaos
2007-01-13 03:05:03 RIP: 0010:[<ffffffff802edd14>]
<ffffffff802edd14>{.text.lock.spinlock+17}
2007-01-13 03:05:03 RSP: 0018:0000010072575968  EFLAGS: 00000086
2007-01-13 03:05:03 RAX: 000001000000d8f8 RBX: 000001000000d780 RCX:
0000000000000002
2007-01-13 03:05:03 RDX: 0000000000000002 RSI: 0000000000000001 RDI:
000001000000d780
2007-01-13 03:05:03 RBP: 00000100025e9d80 R08: 0000000000004000 R09:
0000010068fc1030
2007-01-13 03:05:03 R10: 0000000000000138 R11: 0000000000001d18 R12:
000001000000d780
2007-01-13 03:05:03 R13: 0000000000000002 R14: 0000000000000000 R15:
000001000bcd6e00
2007-01-13 03:05:03 FS:  0000002a958a0b00(0000) GS:ffffffff804e3180(0000)
knlGS:0000000000000000
2007-01-13 03:05:03 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2007-01-13 03:05:03 CR2: 0000002a955ad000 CR3: 000000007f80a000 CR4:
00000000000006e0
2007-01-13 03:05:03 Process socknal_sd01 (pid: 4528, threadinfo
0000010072574000, task 000001006b8854d0)
2007-01-13 03:05:03 Stack: 0000000000000001 0000000000000006 00000100025e9e28
ffffffff80159a9d 
2007-01-13 03:05:03        00000000ffffffff 00000000ffffffff 000001007de9c000
0000000000000002 
2007-01-13 03:05:03        0000000071c35b00 0000010001038000 
2007-01-13 03:05:03 Call Trace:<ffffffff80159a9d>{free_pages_bulk+79}
<ffffffff8029326c>{dev_queue_xmit+530} 
2007-01-13 03:05:03        <ffffffff80159e14>{__free_pages_ok+255}
<ffffffff8015db9f>{kmem_freepages+282} 
2007-01-13 03:05:03        <ffffffff8015e5f4>{slab_destroy+124}
<ffffffff8015e73a>{free_block+285} 
2007-01-13 03:05:03        <ffffffff8015e86c>{cache_flusharray+94}
<ffffffff8015e519>{kfree+195} 
2007-01-13 03:05:03        <ffffffff8028dea8>{kfree_skbmem+9}
<ffffffff802b4e08>{tcp_recvmsg+1589} 
2007-01-13 03:05:03        <ffffffff8028d786>{sock_common_recvmsg+48}
<ffffffff8028a37c>{sock_recvmsg+284} 
2007-01-13 03:05:03        <ffffffff8012dcc5>{recalc_task_prio+337}
<ffffffff801313a2>{autoremove_wake_function+0} 
2007-01-13 03:05:03        
<ffffffffa02388b3>{:ksocklnd:ksocknal_lib_recv_kiov+343} 
2007-01-13 03:05:03        <ffffffffa01fcdfd>{:lnet:lnet_md_unlink+381}
<ffffffffa0233bdc>{:ksocklnd:ksocknal_recv_kiov+62} 
2007-01-13 03:05:03        <ffffffffa0233daa>{:ksocklnd:ksocknal_receive+270} 
2007-01-13 03:05:03       
<ffffffffa023540d>{:ksocklnd:ksocknal_process_receive+126} 
2007-01-13 03:05:03        <ffffffffa0235aa6>{:ksocklnd:ksocknal_scheduler+375} 
2007-01-13 03:05:03        <ffffffff801313a2>{autoremove_wake_function+0}
<ffffffff801313a2>{autoremove_wake_function+0} 
2007-01-13 03:05:03        <ffffffff8010ff3f>{child_rip+8}
<ffffffffa023592f>{:ksocklnd:ksocknal_scheduler+0} 
2007-01-13 03:05:03        <ffffffff8010ff37>{child_rip+0} 
2007-01-13 03:05:03 
2007-01-13 03:05:03 Code: 7e f9 e9 ce fc ff ff e8 c8 69 ef ff e9 4a fd ff ff e8
de 69 
2007-01-13 03:05:03 Kernel panic - not syncing: nmi watchdog
2007-01-13 03:05:03 ----------- [cut here ] --------- [please bite here ] 
---------
2007-01-13 03:05:03 Kernel BUG at panic:74
2007-01-13 03:05:03 invalid operand: 0000 [1] SMP 
2007-01-13 03:05:03 CPU 1 
2007-01-13 03:05:03 Modules linked in: lquota(U) obdfilter(U) fsfilt_ldiskfs(U)
ldiskfs(U) jbd(U) ost(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U)
libcfs(U) perfctr(U) i2c_i801(U) netdump(U) e752x_edac(U) edac_mc(U) i2c_dev(U)
i2c_core(U) dm_mod(U) rtc(U) md(U) sd_mod(U) uhci_hcd(U) ehci_hcd(U) floppy(U)
qla2300(U) qla2xxx(U) scsi_transport_fc(U) scsi_mod(U) unionfs(U) nfs(U)
lockd(U) sunrpc(U) e1000(U)
2007-01-13 03:05:03 Pid: 4528, comm: socknal_sd01 Not tainted 2.6.9-58chaos
2007-01-13 03:05:03 RIP: 0010:[<ffffffff8013370e>] <ffffffff8013370e>{panic+211}
2007-01-13 03:05:03 RSP: 0018:000001007f80dda8  EFLAGS: 00010082
2007-01-13 03:05:03 RAX: 000000000000002c RBX: ffffffff80301d02 RCX:
0000000000000046
2007-01-13 03:05:03 RDX: 0000000000006bf3 RSI: 0000000000000046 RDI:
ffffffff803ba8e0
2007-01-13 03:05:03 RBP: 000001007f80df58 R08: 0000000000000007 R09:
ffffffff80301d02
2007-01-13 03:05:03 R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000021
2007-01-13 03:05:03 R13: 0000000000000002 R14: 0000000000000000 R15:
000001000bcd6e00
2007-01-13 03:05:03 FS:  0000002a958a0b00(0000) GS:ffffffff804e3180(0000)
knlGS:0000000000000000
2007-01-13 03:05:03 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2007-01-13 03:05:03 CR2: 0000002a955ad000 CR3: 000000007f80a000 CR4:
00000000000006e0
2007-01-13 03:05:03 Process socknal_sd01 (pid: 4528, threadinfo
0000010072574000, task 000001006b8854d0)
2007-01-13 03:05:03 Stack: 0000003000000008 000001007f80de88 000001007f80ddc8
0000000000000013 
2007-01-13 03:05:03        0000000000000000 0000000000000046 0000000000006bc7
0000000000000046 
2007-01-13 03:05:03        0000000000000007 ffffffff80304257 
2007-01-13 03:05:03 Call Trace:<ffffffff80110858>{show_stack+241}
<ffffffff80110982>{show_registers+277} 
2007-01-13 03:05:03 <ffffffff80110c89>{die_nmi+130}
<ffffffff8011a597>{nmi_watchdog_tick+216} 
2007-01-13 03:05:03 <ffffffff80111560>{default_do_nmi+122}
<ffffffff8011a64d>{do_nmi+115} 
2007-01-13 03:05:03 <ffffffff8011016b>{paranoid_exit+0}
<ffffffff802edd14>{.text.lock.spinlock+17} 
2007-01-13 03:05:03  <EOE> <ffffffff80159a9d>{free_pages_bulk+79} 
2007-01-13 03:05:03        <ffffffff8029326c>{dev_queue_xmit+530}
<ffffffff80159e14>{__free_pages_ok+255} 
2007-01-13 03:05:03        <ffffffff8015db9f>{kmem_freepages+282}
<ffffffff8015e5f4>{slab_destroy+124} 
2007-01-13 03:05:03        <ffffffff8015e73a>{free_block+285}
<ffffffff8015e86c>{cache_flusharray+94} 
2007-01-13 03:05:03        <ffffffff8015e519>{kfree+195}
<ffffffff8028dea8>{kfree_skbmem+9} 
2007-01-13 03:05:03        <ffffffff802b4e08>{tcp_recvmsg+1589}
<ffffffff8028d786>{sock_common_recvmsg+48} 
2007-01-13 03:05:03        <ffffffff8028a37c>{sock_recvmsg+284}
<ffffffff8012dcc5>{recalc_task_prio+337} 
2007-01-13 03:05:03        <ffffffff801313a2>{autoremove_wake_function+0}
<ffffffffa02388b3>{:ksocklnd:ksocknal_lib_recv_kiov+343} 
2007-01-13 03:05:03        <ffffffffa01fcdfd>{:lnet:lnet_md_unlink+381}
<ffffffffa0233bdc>{:ksocklnd:ksocknal_recv_kiov+62} 
2007-01-13 03:05:03        <ffffffffa0233daa>{:ksocklnd:ksocknal_receive+270} 
2007-01-13 03:05:03       
<ffffffffa023540d>{:ksocklnd:ksocknal_process_receive+126} 
2007-01-13 03:05:03        <ffffffffa0235aa6>{:ksocklnd:ksocknal_scheduler+375} 
2007-01-13 03:05:03        <ffffffff801313a2>{autoremove_wake_function+0}
<ffffffff801313a2>{autoremove_wake_function+0} 
2007-01-13 03:05:03        <ffffffff8010ff3f>{child_rip+8}
<ffffffffa023592f>{:ksocklnd:ksocknal_scheduler+0} 
2007-01-13 03:05:03        <ffffffff8010ff37>{child_rip+0} 
2007-01-13 03:05:03 
2007-01-13 03:05:03 Code: 0f 0b 5e 22 30 80 ff ff ff ff 4a 00 31 ff e8 47 bf fe
ff e8 
2007-01-13 03:05:03 RIP <ffffffff8013370e>{panic+211} RSP <000001007f80dda8>
2007-01-13 03:05:06 CPU#1 is executing netdump.
2007-01-13 03:05:06 < netdump activated - performing handshake with the server. 
>
2007-01-13 03:05:07 NETDUMP START!
2007-01-13 03:05:07 < handshake completed - listening for dump requests. >


I wasn't able to get a crash dump.  It does look like an OOM-related event.  We
have made some progress in that investigation.  I'll summarize ofter the (long)
weekend on Tuesday.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to