Hi
I have been playing around with lustre 1.6.5.1 as part of some testing
that we are doing for an up and coming cluster. I installed it on 2
test machines, Dell 2950's with 8 GB of ram to be exact, and fired up a
test file system.
The test file system was very simple:
/dev/sdb - ~400GB for the MDT/MGS
/dev/sdc - ~4TB for the OST
Both the mdt/mgs and the ost mounted fine, as did the actual lustre
filesystem when I mounted it with the mount command:
mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/testfs
Then I tried to do:
dd if=/dev/zero of=/mnt/testfs/testfile bs=1M count=1024
Just as a preliminary test. And what happens is it'll write ~700MB then
I get a horrible kernel oops and the machine locks up. The whole thing
is pretty stock, it's running on RHEL4 with the Lustre pre-built RHEL4
kernel RPMS and the pre-built lustre-1.6.5.1 for RHEL4. So I will be
relatively surprised if this is a bug that's just slipped under the
radar. I would be less surprised if it is something I am doing wrong,
but I followed the quick-start for a simple lustre config and this is
now where I am stuck. Attached is the kernel oops, if anyone is
interested. I looked at the function it oops'ed on and nothing jumped
out as out of the ordinary at first glance, so I am stumped. The last
line of the oops is sort of cut off because it was captured via
serial-over-lan console. I did find reference to a kernel oops in
mballoc on google, but the resulting bugzilla bug is not readable by the
public, so I am unsure if this has anything to do with that bug.
Any thoughts anyone?
--
Jason Williams
Linux Systems Administrator
Johns Hopkins University
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mballoc:1334
invalid operand: 0000 [1] SMP
CPU 4
Modules linked in: obdfilter(U) ost(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U)
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U)
obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs(U) nfsd(U) exportfs(U) lockd(U)
nfs_acl(U) parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U)
sunrpc(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) iw_cxgb3(U)
cxgb3(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ds(U) yenta_socket(U)
pcmcia_core(U) dm_mirror(U) dm_mod(U) button(U) battery(U) ac(U) uhci_hcd(U)
ehci_hcd(U) hw_random(U) ib_ipath(U) ib_ipoib(U) md5(U) ipv6(U) ib_umad(U)
ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) bnx2(U) ext3(U)
jbd(U) ata_piix(U) libata(U) megaraid_sas(U) sd_mod(U) scsi_mod(U)
Pid: 7605, comm: ll_ost_io_04 Not tainted 2.6.9-67.0.7.EL_lustre.1.6.5.1smp
RIP: 0010:[<ffffffffa06ded79>]
<ffffffffa06ded79>{:ldiskfs:ldiskfs_mb_use_best_found+265}
RSP: 0018:000001020fd8cf30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000001020fd8d058 RCX: 0000000000000037
RDX: 000000000000001c RSI: 0000010211cea000 RDI: 0000000000001000
RBP: 0000000000000800 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000800 R11: 0000000000000800 R12: 000001020fd8d0e8
R13: 0000000000000800 R14: 0000000000000000 R15: 000001020fd8d138
FS: 0000002a95582b00(0000) GS:ffffffff8048e900(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbffff856 CR3: 000000022854c000 CR4: 00000000000006e0
Process ll_ost_io_04 (pid: 7605, threadinfo 000001020fd8c000, task
000001020fc59030)
Stack: 0000000012d58000 000000001152bd20 0000000000000020 000001020fd8d0e8
0000000000011b9c 0000000000011b9c 00000000000034f3 0000000000000000
ffffffffa06dfa65 0000000000000001
Call Trace:<ffffffffa06dfa65>{:ldiskfs:ldiskfs_mb_regular_allocator+1717}
<ffffffff8017d14c>{__getblk+42}
<ffffffffa06e40fd>{:ldiskfs:ldiskfs_mb_new_blocks+333}
<ffffffff80132180>{try_to_wake_up+876}
<ffffffffa0730eb4>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+884}
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa06d9ece>{:ldiskfs:ldiskfs_ext_find_extent+590}
<ffffffffa06db972>{:ldiskfs:ldiskfs_ext_walk_space+482}
<ffffffffa0730b40>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0}
<ffffffffa0731343>{:fsfilt_ldiskfs:fsfilt_map_nblocks+307}
<ffffffffa0425704>{:lnet:lnet_ni_recv+532}
<ffffffffa0424747>{:lnet:lnet_try_match_md+823}
<ffffffffa07315bb>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+539}
<ffffffffa07e80b4>{:obdfilter:filter_direct_io+1108}
<ffffffffa005e524>{:jbd:journal_start+223}
<ffffffffa072f84b>{:fsfilt_ldiskfs:fsfilt_ldiskfs_brw_start+763}
<ffffffffa04291aa>{:lnet:lnet_parse+4650}
<ffffffffa07e9c1d>{:obdfilter:filter_commitrw_write+4957}
<ffffffffa0406539>{:lvfs:pop_ctxt+505}
<ffffffffa0426b32>{:lnet:lnet_send+1154}
<ffffffffa042ab99>{:lnet:LNetGet+1609}
<ffffffffa0423a02>{:lnet:LNetMDBind+690}
<ffffffffa07e201e>{:obdfilter:filter_commitrw+126}
<ffffffffa07ad688>{:ost:ost_checksum_bulk+200}
<ffffffffa07b2dd1>{:ost:ost_brw_write+9505}
<ffffffffa07ad060>{:ost:ost_bulk_timeout+0}
<ffffffffa05125ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07ad060>{:ost:ost_bulk_timeout+0}
<ffffffffa05126e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07ad060>{:ost:ost_bulk_timeout+0}
<ffffffffa07b864d>{:ost:ost_handle+11661}
<ffffffff8015c830>{__rmqueue+218}
<ffffffffa0427b48>{:lnet:lnet_match_blocked_msg+920}
<ffffffff801612ed>{cache_flusharray+107}
<ffffffffa051b451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa051d629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa03f045e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa051fd05>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8} <ffffffffa051ed70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
Code: 0f 0b b3 6c 6e a0 ff ff ff ff 36 05 48 8b 43 20 66 44 29 58
RIP <ffffffffa06ded79>{:ldiskfs:ldiskfs_mb_use_b
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss