OK, I think I fixed the small-io problem and the mkdir problem.
That only leaves the mounting problem. I've never attempted to build
the kernel interface or mount the file system (being the old goat that I
am) so that might take a bit.
I'll commit the changes I made and you can run them in the next nightly
to see if anything new pops up.
Walt
Robert Latham wrote:
On Tue, Aug 29, 2006 at 04:55:06PM -0400, Walter B. Ligon III wrote:
So, I would appreciate some help running some tests on the branch, while
I start documenting, and let me know when you think I should start
merging it back with the trunk. Or I'm open to whatever other
suggestions ...
OK, walt, we're getting close. I committed a couple small fixes to
get pvfs2-client-core building. Here's what's not working so well
right now:
- mounting pvfs2 fails with a timeout
- many MPI-IO workloads pass, but the noncontig test triggered a
segfault in small_io_cleanup, where it cleans up various fields in
the sm_p structure. In particular, 'sm_p->msgarray = NULL' caused a
core dump, and when I look at that core file in gdb,
sm_p->msgarray_count is really high (135950228). Looks like maybe
the sm_p wasn't properly allocated? I dunno, I'm just the messenger.
- pvfs2-cp dies with a segfault when using a very small blocksize (-b
128). here's where gdb says the fault lies:
---------------
#0 0x0806d3d8 in small_io_completion_fn (user_args=0x80f0da8,
resp_p=0xbfffb42c, index=0) at sys-small-io.sm:242
242 fdata.server_nr = sm_p->u.io.datafile_index_array[index];
(gdb) p sm_p->u.io
$8 = {io_type = 135162104, file_req = 0x2, file_req_offset = 0, buffer = 0x0,
mem_req = 0x0, io_resp_p = 0x50, flowproto_type = 17, encoding = 135206232,
datafile_index_array = 0x0, datafile_count = 0,
msgpair_completion_count = 81, flow_completion_count = 0,
write_ack_completion_count = 0, contexts = 0x80f13d4,
context_count = 135205832, total_cancellations_remaining = 0,
retry_count = 135206064, stored_error_code = 3396, total_size = 9,
dfile_size_array = 0x0, small_io = 0}
---------------
- test-zero-fill fails with a segfault in the same place as pvfs2-cp:
---------------
#0 0x08065149 in small_io_completion_fn (user_args=0x80e9940,
resp_p=0xbfffb86c, index=0) at sys-small-io.sm:317
317 sm_p->u.io.dfile_size_array[index] =
resp_p->u.small_io.bstream_size;
---------------
- pvfs2-mkdir (a test contributed by acxiom) fails with a seg fault:
---------------
#0 0x080b134e in PINT_smcb_op (smcb=0x0)
at
/sandbox/robl/pvfs2-nightly/pvfs2-WALT3/src/common/misc/state-machine-fns.c:348
348 return smcb->op;
---------------
So I think if you can take care of the small-io cases, that would be a
good start, as it would knock out 3 of the 5 failures. Once WALT3
passes our nightlies, we can think about merging into HEAD.
==rob
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers