I see that you're re-sorting by owner using a comparator called H5D__cmp_filtered_collective_io_info_entry_owner() which does not sort by a secondary key within items with equal owners. That, together with a sort that isn't stable (which HDqsort() probably isn't on most platforms; quicksort/introsort is not stable), will scramble the order in which different ranks traverse their local chunk arrays. That will cause deadly embraces between ranks that are waiting for each other's chunks to be sent. To fix that, it's probably sufficient to use the chunk offset as a secondary sort key in that comparator.
That's not the root cause of the hang I'm currently experiencing, though. Still digging into that. On Wed, Nov 8, 2017 at 1:50 PM, Dana Robinson <[email protected]> wrote: > Yes. All outside code that frees, allocates, or reallocates memory created > inside the library (or that will be passed back into the library, where it > could be freed or reallocated) should use these functions. This includes > filters. > > > > Dana > > > > From: Jordan Henderson <[email protected]> > Date: Wednesday, November 8, 2017 at 13:46 > To: Dana Robinson <[email protected]>, "[email protected]" > <[email protected]>, HDF List <[email protected]> > Subject: Re: [Hdf-forum] Collective IO and filters > > > > Dana, > > > > would it then make sense for all outside filters to use these routines? Due > to Parallel Compression's internal nature, it uses buffers allocated via > H5MM_ routines to collect and scatter data, which works fine for the > internal filters like deflate, since they use these as well. However, since > some of the outside filters use the raw malloc/free routines, causing > issues, I'm wondering if having all outside filters use the H5_ routines is > the cleanest solution.. > > > > Michael, > > > > Based on the "num_writers: 4" field, the NULL "receive_requests_array" and > the fact that for the same chunk, rank 0 shows "original owner: 0, new > owner: 0" and rank 3 shows "original owner: 3, new_owner: 0", it seems as > though everyone IS interested in the chunk the rank 0 is now working on, but > now I'm more confident that at some point either the messages may have > failed to send or rank 0 is having problems finding the messages. > > > > Since in the unfiltered case it won't hit this particular code path, I'm not > surprised that that case succeeds. If I had to make another guess based on > this, I would be inclined to think that rank 0 must be hanging on the > MPI_Mprobe due to a mismatch in the "tag" field. I use the index of the > chunk as the tag for the message in order to funnel specific messages to the > correct rank for the correct chunk during the last part of the chunk > redistribution and if rank 0 can't match the tag it of course won't find the > message. Why this might be happening, I'm not entirely certain currently. _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
