Hi
I have been playing around with lustre 1.6.5.1 as part of some testing that we are doing for an up and coming cluster. I installed it on 2 test machines, Dell 2950's with 8 GB of ram to be exact, and fired up a test file system.
The test file system was very simple:

/dev/sdb -  ~400GB for the MDT/MGS
/dev/sdc - ~4TB for the OST

Both the mdt/mgs and the ost mounted fine, as did the actual lustre filesystem when I mounted it with the mount command:

mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/testfs

Then I tried to do:

dd if=/dev/zero of=/mnt/testfs/testfile bs=1M count=1024

Just as a preliminary test. And what happens is it'll write ~700MB then I get a horrible kernel oops and the machine locks up. The whole thing is pretty stock, it's running on RHEL4 with the Lustre pre-built RHEL4 kernel RPMS and the pre-built lustre-1.6.5.1 for RHEL4. So I will be relatively surprised if this is a bug that's just slipped under the radar. I would be less surprised if it is something I am doing wrong, but I followed the quick-start for a simple lustre config and this is now where I am stuck. Attached is the kernel oops, if anyone is interested. I looked at the function it oops'ed on and nothing jumped out as out of the ordinary at first glance, so I am stumped. The last line of the oops is sort of cut off because it was captured via serial-over-lan console. I did find reference to a kernel oops in mballoc on google, but the resulting bugzilla bug is not readable by the public, so I am unsure if this has anything to do with that bug.

Any thoughts anyone?

--
Jason Williams
Linux Systems Administrator
Johns Hopkins University
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mballoc:1334
invalid operand: 0000 [1] SMP
CPU 4
Modules linked in: obdfilter(U) ost(U) mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) 
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ptlrpc(U) 
obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs(U) nfsd(U) exportfs(U) lockd(U) 
nfs_acl(U) parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U) 
sunrpc(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) iw_cxgb3(U) 
cxgb3(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ds(U) yenta_socket(U) 
pcmcia_core(U) dm_mirror(U) dm_mod(U) button(U) battery(U) ac(U) uhci_hcd(U) 
ehci_hcd(U) hw_random(U) ib_ipath(U) ib_ipoib(U) md5(U) ipv6(U) ib_umad(U) 
ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) bnx2(U) ext3(U) 
jbd(U) ata_piix(U) libata(U) megaraid_sas(U) sd_mod(U) scsi_mod(U)
Pid: 7605, comm: ll_ost_io_04 Not tainted 2.6.9-67.0.7.EL_lustre.1.6.5.1smp
RIP: 0010:[<ffffffffa06ded79>] 
<ffffffffa06ded79>{:ldiskfs:ldiskfs_mb_use_best_found+265}
RSP: 0018:000001020fd8cf30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000001020fd8d058 RCX: 0000000000000037
RDX: 000000000000001c RSI: 0000010211cea000 RDI: 0000000000001000
RBP: 0000000000000800 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000800 R11: 0000000000000800 R12: 000001020fd8d0e8
R13: 0000000000000800 R14: 0000000000000000 R15: 000001020fd8d138
FS:  0000002a95582b00(0000) GS:ffffffff8048e900(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbffff856 CR3: 000000022854c000 CR4: 00000000000006e0
Process ll_ost_io_04 (pid: 7605, threadinfo 000001020fd8c000, task 
000001020fc59030)
Stack: 0000000012d58000 000000001152bd20 0000000000000020 000001020fd8d0e8
      0000000000011b9c 0000000000011b9c 00000000000034f3 0000000000000000
      ffffffffa06dfa65 0000000000000001
Call Trace:<ffffffffa06dfa65>{:ldiskfs:ldiskfs_mb_regular_allocator+1717}
      <ffffffff8017d14c>{__getblk+42} 
<ffffffffa06e40fd>{:ldiskfs:ldiskfs_mb_new_blocks+333}
      <ffffffff80132180>{try_to_wake_up+876} 
<ffffffffa0730eb4>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+884}
      <ffffffff80133855>{__wake_up_common+67} 
<ffffffffa06d9ece>{:ldiskfs:ldiskfs_ext_find_extent+590}
      <ffffffffa06db972>{:ldiskfs:ldiskfs_ext_walk_space+482}
      <ffffffffa0730b40>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0}
      <ffffffffa0731343>{:fsfilt_ldiskfs:fsfilt_map_nblocks+307}
      <ffffffffa0425704>{:lnet:lnet_ni_recv+532} 
<ffffffffa0424747>{:lnet:lnet_try_match_md+823}
      <ffffffffa07315bb>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+539}
      <ffffffffa07e80b4>{:obdfilter:filter_direct_io+1108}
      <ffffffffa005e524>{:jbd:journal_start+223} 
<ffffffffa072f84b>{:fsfilt_ldiskfs:fsfilt_ldiskfs_brw_start+763}
      <ffffffffa04291aa>{:lnet:lnet_parse+4650} 
<ffffffffa07e9c1d>{:obdfilter:filter_commitrw_write+4957}
      <ffffffffa0406539>{:lvfs:pop_ctxt+505} 
<ffffffffa0426b32>{:lnet:lnet_send+1154}
      <ffffffffa042ab99>{:lnet:LNetGet+1609} 
<ffffffffa0423a02>{:lnet:LNetMDBind+690}
      <ffffffffa07e201e>{:obdfilter:filter_commitrw+126}
      <ffffffffa07ad688>{:ost:ost_checksum_bulk+200} 
<ffffffffa07b2dd1>{:ost:ost_brw_write+9505}
      <ffffffffa07ad060>{:ost:ost_bulk_timeout+0} 
<ffffffffa05125ef>{:ptlrpc:lustre_msg_get_version+95}
      <ffffffffa07ad060>{:ost:ost_bulk_timeout+0} 
<ffffffffa05126e5>{:ptlrpc:lustre_msg_check_version+69}
      <ffffffffa07ad060>{:ost:ost_bulk_timeout+0} 
<ffffffffa07b864d>{:ost:ost_handle+11661}
      <ffffffff8015c830>{__rmqueue+218} 
<ffffffffa0427b48>{:lnet:lnet_match_blocked_msg+920}
      <ffffffff801612ed>{cache_flusharray+107} 
<ffffffffa051b451>{:ptlrpc:ptlrpc_check_req+17}
      <ffffffffa051d629>{:ptlrpc:ptlrpc_server_handle_request+2457}
      <ffffffffa03f045e>{:libcfs:lcw_update_time+30} 
<ffffffff80133855>{__wake_up_common+67}
      <ffffffffa051fd05>{:ptlrpc:ptlrpc_main+3989} 
<ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0}
      <ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0} 
<ffffffffa051e270>{:ptlrpc:ptlrpc_retry_rqbds+0}
      <ffffffff80110de3>{child_rip+8} <ffffffffa051ed70>{:ptlrpc:ptlrpc_main+0}
      <ffffffff80110ddb>{child_rip+0}

Code: 0f 0b b3 6c 6e a0 ff ff ff ff 36 05 48 8b 43 20 66 44 29 58
RIP <ffffffffa06ded79>{:ldiskfs:ldiskfs_mb_use_b
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to