I added a debug printf (I am currently running a test with 4 ranks on
the same host), and here is what I see. The "M of N" numbers reflect
the size of the memspace and filespace respectively. The printf is
inserted immediately before H5Dwrite() in my modified version of
ISView_General_HDF5() (in PETSc's
src/vec/is/is/impls/general/general.c).
About to write 148 of 636
About to write 176 of 636
About to write 163 of 636
About to write 149 of 636
About to write 176 of 636
About to write 148 of 636
About to write 149 of 636
About to write 163 of 636
About to write 310 of 1136
About to write 266 of 1136
About to write 258 of 1136
About to write 302 of 1136
About to write 310 of 1136
About to write 266 of 1136
About to write 258 of 1136
About to write 302 of 1136
About to write 124 of 520
About to write 120 of 520
About to write 140 of 520
About to write 136 of 520
About to write 23 of 80
About to write 19 of 80
About to write 14 of 80
About to write 24 of 80
About to write 12 of 20
About to write 0 of 20
About to write 0 of 20
About to write 8 of 20
HDF5-DIAG: Error detected in HDF5 (1.11.0) MPI-process 0:
#000: H5Dio.c line 319 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 395 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 836 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#003: H5Dmpio.c line 1019 in H5D__chunk_collective_write(): write error
major: Dataspace
minor: Write failed
#004: H5Dmpio.c line 934 in H5D__chunk_collective_io(): couldn't
finish filtered linked chunk MPI-IO
major: Low-level I/O
minor: Can't get value
#005: H5Dmpio.c line 1474 in
H5D__link_chunk_filtered_collective_io(): couldn't process chunk entry
major: Dataset
minor: Write failed
#006: H5Dmpio.c line 3277 in
H5D__filtered_collective_chunk_entry_io(): couldn't unfilter chunk for
modifying
major: Data filters
minor: Filter operation failed
#007: H5Z.c line 1256 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
I'm trying to do this in the way you suggested, where non-contributing
ranks create a zero-sized memspace (with the appropriate dimensions)
and call H5Sselect_none() on the filespace, then call H5Dwrite() in
the usual way to participate in the collective write. Where in the
code would you expect the test that filters out zero-sized chunks to
be?
On Thu, Nov 9, 2017 at 11:39 AM, Jordan Henderson
<[email protected]> wrote:
> By zero-sized chunks do you mean to say that the actual chunks in the
> dataset are zero-sized or the data going to the write is zero-sized? It
> would seem odd to me if you were writing to an essentially zero-sized
> dataset composed of zero-sized chunks.
>
> On the other hand, for ranks that aren't participating, they should never
> construct a list of chunks in the H5D__construct_filtered_io_info_list()
> function and thus should never participate in any chunk updating, only the
> collective file space re-allocations and re-insertion of chunks into the
> chunk index. That being said, if you are indeed seeing zero-sized malloc
> calls in the chunk update function, something must be wrong somewhere. While
> it is true that the chunks currently move to the rank with the largest
> contribution to the chunk which ALSO has the least amount of chunks
> currently assigned to it (to try and get a more even distribution of chunks
> among all the ranks), any rank which has a zero-sized contribution to a
> chunk should never have created a chunk struct entry for the chunk and thus
> should not be participating in the chunk updating loop (lines 1471-1474 in
> the current develop branch). They should pass that loop and wait at the
> subsequent H5D__mpio_array_gatherv() until the other ranks get done
> processing. Again, this is what should happen but in your case may not be
> the actuality of the situation.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5