It's not even clear to me yet whether this is the same dataset that
triggered the assert.  Working on getting complete details.  But FWIW
the PETSc code does not call H5Sselect_none().  It calls
H5Sselect_hyperslab() in all ranks, and that's why the ranks in which
the slice is zero columns wide hit the "empty sel_chunks" pathway I
added to H5D__create_chunk_mem_map_hyper().


On Wed, Nov 8, 2017 at 12:02 PM, Michael K. Edwards
<[email protected]> wrote:
> Thanks, Jordan.  I recognize that this is very recent feature work and
> my goal is to help push it forward.
>
> My current use case is relatively straightforward, though there are a
> couple of layers on top of HDF5 itself.  The problem can be reproduced
> by building PETSc 3.8.1 against libraries built from the develop
> branch of HDF5, adding in the H5Dset_filter() calls, and running an
> example that exercises them.  (I'm using
> src/snes/examples/tutorials/ex12.c with the -dm_view_hierarchy flag to
> induce HDF5 writes.)  If you want, I can supply full details for you
> to reproduce it locally, or I can do any experiments you'd like me to
> within this setup.  (It also involves patches to the out-of-tree H5Z
> plugins to make them use H5MM_malloc/H5MM_xfree rather than raw
> malloc/free, which in turn involves exposing H5MMprivate.h to the
> plugins.  Is this something you've solved in a different way?)
>
>
> On Wed, Nov 8, 2017 at 11:44 AM, Jordan Henderson
> <[email protected]> wrote:
>> Hi Michael,
>>
>>
>> during the design phase of this feature I tried to both account for and test
>> the case where some of the writers do not have any data to contribute.
>> However, it seems like your use case falls outside of what I have tested
>> (perhaps I have not used enough ranks?). In particular my test cases were
>> small and simply had some of the ranks call H5Sselect_none(), which doesn't
>> seem to trigger this particular assertion failure. Is this how you're
>> approaching these particular ranks in your code or is there a different way
>> you are having them participate in the write operation?
>>
>>
>> As for the hanging issue, it looks as though rank 0 is waiting to receive
>> some modification data from another rank for a particular chunk. Whether or
>> not there is actually valid data that rank 0 should be waiting for, I cannot
>> easily tell without being able to trace it through. As the other ranks have
>> finished modifying their particular sets of chunks, they have moved on and
>> are waiting for everyone to get together and broadcast their new chunk sizes
>> so that free space in the file can be collectively re-allocated, but of
>> course rank 0 is not proceeding forward. My best guess is that either:
>>
>>
>> The "num_writers" field for the chunk struct corresponding to the particular
>> chunk that rank 0 is working on has been incorrectly set, causing rank 0 to
>> think that there are more ranks writing to the chunk than the actual amount
>> and consequently causing rank 0 to wait forever for a non-existent MPI
>> message
>>
>>
>> or
>>
>>
>> The "new_owner" field of the chunk struct for this chunk was incorrectly set
>> on the other ranks, causing them to never issue an MPI_Isend to rank 0, also
>> causing rank 0 to wait for a non-existent MPI message
>>
>>
>> This feature should still be regarded as being in beta and its complexity
>> can lead to difficult to track down bugs such as the ones you are currently
>> encountering. That being said, your feedback is very useful and will help to
>> push this feature towards a production-ready level of quality. Also, if it
>> is feasible to come up with a minimal example that reproduces this issue, it
>> would be very helpful and would make it much easier to diagnose why exactly
>> these failures are occurring.
>>
>> Thanks,
>> Jordan
>>
>> ________________________________
>> From: Hdf-forum <[email protected]> on behalf of Michael
>> K. Edwards <[email protected]>
>> Sent: Wednesday, November 8, 2017 11:23 AM
>> To: Miller, Mark C.
>> Cc: HDF Users Discussion List
>> Subject: Re: [Hdf-forum] Collective IO and filters
>>
>> Closer to 1000 ranks initially.  There's a bug in handling the case
>> where some of the writers don't have any data to contribute (because
>> there's a dimension smaller than the number of ranks), which I have
>> worked around like this:
>>
>> diff --git a/src/H5Dchunk.c b/src/H5Dchunk.c
>> index af6599a..9522478 100644
>> --- a/src/H5Dchunk.c
>> +++ b/src/H5Dchunk.c
>> @@ -1836,6 +1836,9 @@ H5D__create_chunk_mem_map_hyper(const H5D_chunk_map_t
>> *fm)
>>          /* Indicate that the chunk's memory space is shared */
>>          chunk_info->mspace_shared = TRUE;
>>      } /* end if */
>> +    else if(H5SL_count(fm->sel_chunks)==0) {
>> +        /* No chunks, because no local data; avoid
>> HDassert(fm->m_ndims==fm->f_ndims) on null mem_space */
>> +    } /* end else if */
>>      else {
>>          /* Get bounding box for file selection */
>>          if(H5S_SELECT_BOUNDS(fm->file_space, file_sel_start, file_sel_end)
>> < 0)
>>
>> That makes the assert go away.  Now I'm investigating a hang in the
>> chunk redistribution logic in rank 0, with a backtrace that looks like
>> this:
>>
>> #0  0x00007f4bd456a6c6 in psm2_mq_ipeek2 () from /lib64/libpsm2.so.2
>> #1  0x00007f4bd5d3b341 in psm_progress_wait () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #2  0x00007f4bd5d3012d in MPID_Mprobe () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #3  0x00007f4bd5cbeeb4 in PMPI_Mprobe () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #4  0x00007f4bd81aadf6 in H5D__chunk_redistribute_shared_chunks
>> (io_info=0x7ffdfb83de60, type_info=0x7ffdfb83dde0, fm=0x17eeec0,
>> local_chunk_array=0x17f0f80,
>>     local_chunk_array_num_entries=0x7ffdfb83d9f8) at H5Dmpio.c:3041
>> #5  0x00007f4bd81a9696 in H5D__construct_filtered_io_info_list
>> (io_info=0x7ffdfb83de60, type_info=0x7ffdfb83dde0, fm=0x17eeec0,
>> chunk_list=0x7ffdfb83daf0, num_entries=0x7ffdfb83db00)
>>     at H5Dmpio.c:2794
>> #6  0x00007f4bd81a2d58 in H5D__link_chunk_filtered_collective_io
>> (io_info=0x7ffdfb83de60, type_info=0x7ffdfb83dde0, fm=0x17eeec0,
>> dx_plist=0x16f7230) at H5Dmpio.c:1447
>> #7  0x00007f4bd81a027d in H5D__chunk_collective_io
>> (io_info=0x7ffdfb83de60, type_info=0x7ffdfb83dde0, fm=0x17eeec0) at
>> H5Dmpio.c:933
>> #8  0x00007f4bd81a0968 in H5D__chunk_collective_write
>> (io_info=0x7ffdfb83de60, type_info=0x7ffdfb83dde0, nelmts=104,
>> file_space=0x17e2dc0, mem_space=0x17dc770, fm=0x17eeec0) at
>> H5Dmpio.c:1018
>> #9  0x00007f4bd7ce3d63 in H5D__write (dataset=0x17e0010,
>> mem_type_id=216172782113783851, mem_space=0x17dc770,
>> file_space=0x17e2dc0, dxpl_id=720575940379279384, buf=0x17d6240) at
>> H5Dio.c:835
>> #10 0x00007f4bd7ce181c in H5D__pre_write (dset=0x17e0010,
>> direct_write=false, mem_type_id=216172782113783851,
>> mem_space=0x17dc770, file_space=0x17e2dc0, dxpl_id=720575940379279384,
>> buf=0x17d6240)
>>     at H5Dio.c:394
>> #11 0x00007f4bd7ce0fd1 in H5Dwrite (dset_id=360287970189639680,
>> mem_type_id=216172782113783851, mem_space_id=288230376151711749,
>> file_space_id=288230376151711750, dxpl_id=720575940379279384,
>>     buf=0x17d6240) at H5Dio.c:318
>>
>> The other ranks have moved past this and are hanging here:
>>
>> #0  0x00007feb6e6546c6 in psm2_mq_ipeek2 () from /lib64/libpsm2.so.2
>> #1  0x00007feb6fe25341 in psm_progress_wait () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #2  0x00007feb6fdd8975 in MPIC_Wait () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #3  0x00007feb6fdd918b in MPIC_Sendrecv () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #4  0x00007feb6fcf0fda in MPIR_Allreduce_pt2pt_rd_MV2 () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #5  0x00007feb6fcf48ef in MPIR_Allreduce_index_tuned_intra_MV2 () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #6  0x00007feb6fca1534 in MPIR_Allreduce_impl () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #7  0x00007feb6fca1b93 in PMPI_Allreduce () from
>> /usr/mpi/gcc/mvapich2-2.2-hfi/lib/libmpi.so.12
>> #8  0x00007feb72287c2a in H5D__mpio_array_gatherv
>> (local_array=0x125f2d0, local_array_num_entries=0,
>> array_entry_size=368, _gathered_array=0x7ffff083f1d8,
>>     _gathered_array_num_entries=0x7ffff083f1e8, nprocs=4,
>> allgather=true, root=0, comm=-1006632952, sort_func=0x0) at
>> H5Dmpio.c:479
>> #9  0x00007feb7228cfb8 in H5D__link_chunk_filtered_collective_io
>> (io_info=0x7ffff083f540, type_info=0x7ffff083f4c0, fm=0x125d280,
>> dx_plist=0x11cf240) at H5Dmpio.c:1479
>> #10 0x00007feb7228a27d in H5D__chunk_collective_io
>> (io_info=0x7ffff083f540, type_info=0x7ffff083f4c0, fm=0x125d280) at
>> H5Dmpio.c:933
>> #11 0x00007feb7228a968 in H5D__chunk_collective_write
>> (io_info=0x7ffff083f540, type_info=0x7ffff083f4c0, nelmts=74,
>> file_space=0x12514e0, mem_space=0x124b450, fm=0x125d280) at
>> H5Dmpio.c:1018
>> #12 0x00007feb71dcdd63 in H5D__write (dataset=0x124e7d0,
>> mem_type_id=216172782113783851, mem_space=0x124b450,
>> file_space=0x12514e0, dxpl_id=720575940379279384, buf=0x1244e80) at
>> H5Dio.c:835
>> #13 0x00007feb71dcb81c in H5D__pre_write (dset=0x124e7d0,
>> direct_write=false, mem_type_id=216172782113783851,
>> mem_space=0x124b450, file_space=0x12514e0, dxpl_id=720575940379279384,
>> buf=0x1244e80)
>>     at H5Dio.c:394
>> #14 0x00007feb71dcafd1 in H5Dwrite (dset_id=360287970189639680,
>> mem_type_id=216172782113783851, mem_space_id=288230376151711749,
>> file_space_id=288230376151711750, dxpl_id=720575940379279384,
>>     buf=0x1244e80) at H5Dio.c:318
>>
>> (I'm currently running with this patch atop commit bf570b1, on an
>> earlier theory that the crashing bug may have crept in after Jordan's
>> big merge.  I'll rebase on current develop but I doubt that'll change
>> much.)
>>
>> The hang may or may not be directly related to the workaround being a
>> bit of a hack.  I can set you up with full reproduction details if you
>> like; I seem to be getting some traction on it, but more eyeballs are
>> always good, especially if they're better set up for MPI tracing than
>> I am right now.
>>
>>
>> On Wed, Nov 8, 2017 at 8:48 AM, Miller, Mark C. <[email protected]> wrote:
>>> Hi Michael,
>>>
>>>
>>>
>>> I have not tried this in parallel yet. That said, what scale are you
>>> trying
>>> to do this at? 1000 ranks or 1,000,000 ranks? Something in between?
>>>
>>>
>>>
>>> My understanding is that there are some known scaling issues out past
>>> maybe
>>> 10,000 ranks. Not heard of outright assertion failures there though.
>>>
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>>
>>>
>>> "Hdf-forum on behalf of Michael K. Edwards" wrote:
>>>
>>>
>>>
>>> I'm trying to write an HDF5 file with dataset compression from an MPI
>>>
>>> job.  (Using PETSc 3.8 compiled against MVAPICH2, if that matters.)
>>>
>>> After running into the "Parallel I/O does not support filters yet"
>>>
>>> error message in release versions of HDF5, I have turned to the
>>>
>>> develop branch.  Clearly there has been much work towards collective
>>>
>>> filtered IO in the run-up to a 1.11 (1.12?) release; equally clearly
>>>
>>> it is not quite ready for prime time yet.  So far I've encountered a
>>>
>>> livelock scenario with ZFP, reproduced it with SZIP, and, with no
>>>
>>> filters at all, obtained this nifty error message:
>>>
>>>
>>>
>>> ex12: H5Dchunk.c:1849: H5D__create_chunk_mem_map_hyper: Assertion
>>>
>>> `fm->m_ndims==fm->f_ndims' failed.
>>>
>>>
>>>
>>> Has anyone on this list been able to write parallel HDF5 using a
>>>
>>> recent state of the develop branch, with or without filters
>>>
>>> configured?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> - Michael
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Hdf-forum is for HDF software users discussion.
>>>
>>> [email protected]
>>>
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>
>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>> The HDF Group (@hdf5) | Twitter
>> twitter.com
>> The latest Tweets from The HDF Group (@hdf5). Technologies and supporting
>> services that make possible the management of large, complex data
>> collections. Support ...
>>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to