Replacing Intel's build of MVAPICH2 2.2 with a fresh build of MVAPICH2
2.3b got me farther along. The comm mismatch does not seem to be a
problem. I am guessing that the root cause was whatever bug is listed
in http://mvapich.cse.ohio-state.edu/static/media/mvapich/MV2_CHANGELOG-2.3b.txt
as:
- Fix hang in MPI_Probe
- Thanks to John Westlund@Intel for the report
I fixed the H5D__cmp_filtered_collective_io_info_entry_owner
comparator, and now I'm back to fixing things about my patch to PETSc.
I seem to be trying to filter a dataset that I shouldn't be.
HDF5-DIAG: Error detected in HDF5 (1.11.0) MPI-process 0:
#000: H5Dio.c line 319 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 395 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 831 in H5D__write(): unable to adjust I/O info
for parallel I/O
major: Dataset
minor: Unable to initialize object
#003: H5Dio.c line 1264 in H5D__ioinfo_adjust(): Can't perform
independent write with filters in pipeline.
The following caused a break from collective I/O:
Local causes:
Global causes: one of the dataspaces was neither simple nor scalar
major: Low-level I/O
minor: Can't perform independent IO
On Wed, Nov 8, 2017 at 11:37 PM, Michael K. Edwards
<[email protected]> wrote:
> Oddly enough, it is not the tag that is mismatched between receiver
> and senders; it is io_info->comm. Something is decidedly out of whack
> here.
>
> Rank 0, owner 0 probing with tag 0 on comm -1006632942
> Rank 2, owner 0 sent with tag 0 to comm -1006632952 as request 0
> Rank 3, owner 0 sent with tag 0 to comm -1006632952 as request 0
> Rank 1, owner 0 sent with tag 0 to comm -1006632952 as request 0
>
>
> On Wed, Nov 8, 2017 at 2:51 PM, Michael K. Edwards
> <[email protected]> wrote:
>>
>> I see that you're re-sorting by owner using a comparator called
>> H5D__cmp_filtered_collective_io_info_entry_owner() which does not sort
>> by a secondary key within items with equal owners. That, together
>> with a sort that isn't stable (which HDqsort() probably isn't on most
>> platforms; quicksort/introsort is not stable), will scramble the order
>> in which different ranks traverse their local chunk arrays. That will
>> cause deadly embraces between ranks that are waiting for each other's
>> chunks to be sent. To fix that, it's probably sufficient to use the
>> chunk offset as a secondary sort key in that comparator.
>>
>> That's not the root cause of the hang I'm currently experiencing,
>> though. Still digging into that.
>>
>>
>> On Wed, Nov 8, 2017 at 1:50 PM, Dana Robinson <[email protected]> wrote:
>> > Yes. All outside code that frees, allocates, or reallocates memory created
>> > inside the library (or that will be passed back into the library, where it
>> > could be freed or reallocated) should use these functions. This includes
>> > filters.
>> >
>> >
>> >
>> > Dana
>> >
>> >
>> >
>> > From: Jordan Henderson <[email protected]>
>> > Date: Wednesday, November 8, 2017 at 13:46
>> > To: Dana Robinson <[email protected]>, "[email protected]"
>> > <[email protected]>, HDF List <[email protected]>
>> > Subject: Re: [Hdf-forum] Collective IO and filters
>> >
>> >
>> >
>> > Dana,
>> >
>> >
>> >
>> > would it then make sense for all outside filters to use these routines? Due
>> > to Parallel Compression's internal nature, it uses buffers allocated via
>> > H5MM_ routines to collect and scatter data, which works fine for the
>> > internal filters like deflate, since they use these as well. However, since
>> > some of the outside filters use the raw malloc/free routines, causing
>> > issues, I'm wondering if having all outside filters use the H5_ routines is
>> > the cleanest solution..
>> >
>> >
>> >
>> > Michael,
>> >
>> >
>> >
>> > Based on the "num_writers: 4" field, the NULL "receive_requests_array" and
>> > the fact that for the same chunk, rank 0 shows "original owner: 0, new
>> > owner: 0" and rank 3 shows "original owner: 3, new_owner: 0", it seems as
>> > though everyone IS interested in the chunk the rank 0 is now working on,
>> > but
>> > now I'm more confident that at some point either the messages may have
>> > failed to send or rank 0 is having problems finding the messages.
>> >
>> >
>> >
>> > Since in the unfiltered case it won't hit this particular code path, I'm
>> > not
>> > surprised that that case succeeds. If I had to make another guess based on
>> > this, I would be inclined to think that rank 0 must be hanging on the
>> > MPI_Mprobe due to a mismatch in the "tag" field. I use the index of the
>> > chunk as the tag for the message in order to funnel specific messages to
>> > the
>> > correct rank for the correct chunk during the last part of the chunk
>> > redistribution and if rank 0 can't match the tag it of course won't find
>> > the
>> > message. Why this might be happening, I'm not entirely certain currently.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5