Re: [Hdf-forum] Strategy for pHDF5 collective reads/writs on variable sized communicators

Brandon Barker Wed, 03 Jun 2015 05:29:57 -0700

On Wed, Jun 3, 2015 at 12:22 AM, Elena Pourmal <[email protected]> wrote:
> Hi Brandon,
>
>
> On Jun 2, 2015, at 1:53 PM, Brandon Barker <[email protected]>
> wrote:
>
> Based on this comment in the HDF5 docs, I would think it would be acceptable
> for a hypserlab selection to go beyond the extent of what was allocated
> (written to) the dataset, at least if chunking is used:
> "Fill-values are only used for chunked storage datasets when an unallocated
> chunk is read from."
>
> It will be very helpful if you pony me to the source :-) The sentence is
> misleading and we need to fix it.


Sure, thanks:
https://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

To be fair, the subsequent sentence may actually clarify the issue somewhat.

> Fill values can be used for datasets with any storage type. If a fill value
> is set and an application reads data from location that was not written to,
> then the HDF5 library will return the fill value back.
>
>
> I specified a fill value now but this didn't seem to make a difference; do
> hyperslabs have some additional conditions that prevent fill values from
> working or am I doing something else wrong?
>
> I've tested stride, count, etc with H5Dwrite - this seems to work fine. I
> use the same values for H5Dread. H5Dread also works if mpi_size doesn't
> change between runs. But it would be nice if I could find out how to make
> this more flexible between runs so mpi_size would have to be fixed.
>
> I am sorry, I don’t understand your choice of stride. Are you reading the
> same data with more processes? If so, then the stride increases and
> selection goes beyond the extent.

Yes, the stride increases as a result of mpi_size increasing.
I have tried to use H5Dset_extent to correct for this, but either it
doesn't help here or I'm using it incorrectly.

>
> May be the examples provided at
> https://www.hdfgroup.org/HDF5/Tutor/parallel.html will help?

I looked at these a bit earlier and they were helpful for other
things, but I don't think there is an example that covers a similar
situation to mine: I want to read data with fill values beyond the
*original* extent of the defined dataset. I'm happy to increase the
extent to achieve reading of fill values, as noted above.

If none of that works I think I'd have to significantly increase the
complexity of the code and memory shuffling in order to achieve
something similar, i.e., first read in data using the original extent
size and then maybe do an MPI Scatter to send the data to other
processes, but I assume that would be a performance hit as well.

By the way, this isn't a "blocker" situation for me since the example
is already working "somewhat"; this isn't for production code, just
trying to understand pHDF5 a bit better. Still, it really bothers me,
so I would like to find the correct solution eventually :).

Thanks again,
Brandon

>
> Elena
>
> Thanks,
>
> On Fri, May 29, 2015 at 11:17 AM, Brandon Barker
> <[email protected]> wrote:
>>
>> In the above, I assumed I can't change the arguments to
>> H5Sselect_hyperslab (at least not easily), so I tried to fix the issue
>> by changing the extent size using a call to H5Dset_extent, with the
>> further assumption that a fill value would be used if I try to read
>> beyond the end of the data stored in the dataset ... is this wrong?
>>
>> On Thu, May 28, 2015 at 4:18 PM, Brandon Barker
>> <[email protected]> wrote:
>> > Thanks Elena,
>> >
>> > Apologies below for using "chunk" in a different way (e.g.
>> > chunk_counter;
>> > MPI_CHUNK_SIZE) than it is used in HDF5; perhaps I should call them
>> > "slabs".
>> >
>> > Code from the checkpoint procedure (seems to work):
>> >   // dataset and memoryset dimensions (just 1d here)
>> >   hsize_t     dimsm[]     = {chunk_counter * MPI_CHUNK_SIZE};
>> >   hsize_t     dimsf[]     = {dimsm[0] * mpi_size};
>> >   hsize_t     maxdims[]   = {H5S_UNLIMITED};
>> >   hsize_t     chunkdims[] = {1};
>> >   // hyperslab offset and size info */
>> >   hsize_t     start[]   = {mpi_rank * MPI_CHUNK_SIZE};
>> >   hsize_t     count[]   = {chunk_counter};
>> >   hsize_t     block[]   = {MPI_CHUNK_SIZE};
>> >   hsize_t     stride[]  = {MPI_CHUNK_SIZE * mpi_size};
>> >
>> >   dset_plist_create_id = H5Pcreate (H5P_DATASET_CREATE);
>> >   status = H5Pset_chunk (dset_plist_create_id, RANK, chunkdims);
>> >   dset_id = H5Dcreate (file_id, DATASETNAME, big_int_h5, filespace,
>> >                        H5P_DEFAULT, dset_plist_create_id, H5P_DEFAULT);
>> >   assert(dset_id != HDF_FAIL);
>> >
>> >
>> >   H5Sselect_hyperslab(filespace, H5S_SELECT_SET,
>> >                       start, stride, count, block);
>> >
>> >
>> >
>> > Code from the restore procedure (this is where the problem is):
>> >
>> >   // dataset and memoryset dimensions (just 1d here)
>> >   hsize_t     dimsm[1];
>> >   hsize_t     dimsf[1];
>> >   // hyperslab offset and size info
>> >   hsize_t     start[]   = {mpi_rank * MPI_CHUNK_SIZE};
>> >   hsize_t     count[1];
>> >   hsize_t     block[]   = {MPI_CHUNK_SIZE};
>> >   hsize_t     stride[]  = {MPI_CHUNK_SIZE * mpi_size};
>> >
>> >
>> >  //
>> >   // Update dimensions and dataspaces as appropriate
>> >   //
>> >   chunk_counter = get_restore_chunk_counter(dimsf[0]); // Number of
>> > chunks
>> > previously used plus enough new chunks to be divisible by mpi_size.
>> >   count[0] = chunk_counter;
>> >   dimsm[0] = chunk_counter * MPI_CHUNK_SIZE;
>> >   dimsf[0] = dimsm[0] * mpi_size;
>> >   status = H5Dset_extent(dset_id, dimsf);
>> >   assert(status != HDF_FAIL);
>> >
>> >   //
>> >   // Create the memspace for the dataset and allocate data for it
>> >   //
>> >   memspace = H5Screate_simple(RANK, dimsm, NULL);
>> >   perf_diffs = alloc_and_init(perf_diffs, dimsm[0]);
>> >
>> >   H5Sselect_hyperslab(filespace, H5S_SELECT_SET, start, stride, count,
>> > block);
>> >
>> >
>> > Complete example code:
>> >
>> > https://github.com/cornell-comp-internal/CR-demos/blob/bc507264fe4040d817a2e9603dace0dc06585015/demos/pHDF5/perfectNumbers.c
>> >
>> >
>> > Best,
>> >
>> >
>> >
>> > The complete example is here:
>> >
>> > On Thu, May 28, 2015 at 3:43 PM, Elena Pourmal <[email protected]>
>> > wrote:
>> >>
>> >> Hi Brandon,
>> >>
>> >> The error message indicates that a hyperslab selection goes beyond
>> >> dataset
>> >> extent.
>> >>
>> >> Please make sure that you are using the correct values for the start,
>> >> stride, count and block parameters in the H5Sselect_hyperslab call (if
>> >> you
>> >> use it!).  It will help if you provide an excerpt from your code that
>> >> selects hyperslabs for each process.
>> >>
>> >> Elena
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >> Elena Pourmal  The HDF Group  http://hdfgroup.org
>> >> 1800 So. Oak St., Suite 203, Champaign IL 61820
>> >> 217.531.6112
>> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >>
>> >>
>> >>
>> >>
>> >> On May 28, 2015, at 1:46 PM, Brandon Barker
>> >> <[email protected]>
>> >> wrote:
>> >>
>> >> I believe I've gotten a bit closer by using chunked datasets, but I'm
>> >> now
>> >> not sure how to get past this:
>> >>
>> >> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 ./perfectNumbers
>> >> m, f, count,: 840, 1680, 84
>> >> m, f, count,: 840, 1680, 84
>> >> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1:
>> >>   #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not
>> >> within extent
>> >>     major: Dataspace
>> >>     minor: Out of range
>> >> perfectNumbers: perfectNumbers.c:399: restore: Assertion `status != -1'
>> >> failed.
>> >>
>> >> --------------------------------------------------------------------------
>> >> mpirun noticed that process rank 1 with PID 28420 on node
>> >> euca-128-84-11-180 exited on signal 11 (Segmentation fault).
>> >>
>> >> --------------------------------------------------------------------------
>> >>
>> >>
>> >> (m,f,count) represent the memory space and dataspace lengths and the
>> >> count
>> >> of strided segments to be read in; prior to using set extents as
>> >> follows, I
>> >> would get the error when f was not a multiple of m
>> >> dimsf[0] = dimsm[0] * mpi_size;
>> >> H5Dset_extent(dset_id, dimsf);
>> >>
>> >> Now that I am using these, I note that it doesn't seem to have helped
>> >> the
>> >> issue, so there must be something else I still need to do.
>> >>
>> >> Incidentally, I was looking at this example and am not sure what the
>> >> point
>> >> of the following code is since rank_chunk is never used:
>> >>     if (H5D_CHUNKED == H5Pget_layout (prop))
>> >>        rank_chunk = H5Pget_chunk (prop, rank, chunk_dimsr);
>> >>
>> >> I guess it is just to demonstrate the function call of H5Pget_chunk?
>> >>
>> >> On Thu, May 28, 2015 at 10:27 AM, Brandon Barker
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> I have fixed (and pushed the fix for) one bug that related to an
>> >>> improperly defined count in the restore function. I still have issues
>> >>> for m
>> >>> != n:
>> >>>
>> >>>   #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not
>> >>> within extent
>> >>>     major: Dataspace
>> >>>     minor: Out of range
>> >>>
>> >>> I believe this is indicative of me needing to use chunked datasets so
>> >>> that my dataset can grow in size dynamically.
>> >>>
>> >>> On Wed, May 27, 2015 at 5:03 PM, Brandon Barker
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>> Hi All,
>> >>>>
>> >>>> I've been learning pHDF5 by way of developing a toy application that
>> >>>> checkpoints and restores its state. The restore function was the last
>> >>>> to be
>> >>>> implemented, but I realized after doing so that I have an issue:
>> >>>> since each
>> >>>> process has strided blocks of data that it is responsible for, the
>> >>>> number of
>> >>>> blocks of data saved during one run may not be evenly distributed
>> >>>> among
>> >>>> processes in another run, as the mpi_size of the latter run may not
>> >>>> evenly
>> >>>> divide the total number of blocks.
>> >>>>
>> >>>> I was hoping that a fill value might save me here, and just read in
>> >>>> 0s
>> >>>> if I try reading beyond the end of the dataset. Although, I believe I
>> >>>> did
>> >>>> see a page noting that this isn't possible for contiguous datasets.
>> >>>>
>> >>>> The good news is that since I'm working with 1-dimenional data, it is
>> >>>> fairly easy to refactor relevant code.
>> >>>>
>> >>>> The error I get emits this message:
>> >>>>
>> >>>> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 perfectNumbers
>> >>>> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0:
>> >>>>   #000: ../../src/H5Dio.c line 179 in H5Dread(): can't read data
>> >>>>     major: Dataset
>> >>>>     minor: Read failed
>> >>>>   #001: ../../src/H5Dio.c line 446 in H5D__read(): src and dest data
>> >>>> spaces have different sizes
>> >>>>     major: Invalid arguments to routine
>> >>>>     minor: Bad value
>> >>>> perfectNumbers: perfectNumbers.c:382: restore: Assertion `status !=
>> >>>> -1'
>> >>>> failed.
>> >>>>
>> >>>>
>> >>>> --------------------------------------------------------------------------
>> >>>> mpirun noticed that process rank 0 with PID 3717 on node
>> >>>> euca-128-84-11-180 exited on signal 11 (Segmentation fault).
>> >>>>
>> >>>> Here is the offending line in the restore function; you can observe
>> >>>> the
>> >>>> checkpoint function to see how things are written out to disk.
>> >>>>
>> >>>> General pointers are appreciated as well - to paraphrase the problem
>> >>>> more simply: I have a distributed (strided) array I write out to disk
>> >>>> as a
>> >>>> dataset among n processes, and when I restart the program, I may want
>> >>>> to
>> >>>> divvy up the data among m processes in similar datastructures as
>> >>>> before, but
>> >>>> now m != n. Actually, my problem may be different than just this,
>> >>>> since I
>> >>>> seem to get the same issue even when m == n ... hmm.
>> >>>>
>> >>>> Thanks,
>> >>>> --
>> >>>> Brandon E. Barker
>> >>>> http://www.cac.cornell.edu/barker/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Brandon E. Barker
>> >>> http://www.cac.cornell.edu/barker/
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Brandon E. Barker
>> >> http://www.cac.cornell.edu/barker/
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> [email protected]
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Brandon E. Barker
>> > http://www.cac.cornell.edu/barker/
>>
>>
>>
>> --
>> Brandon E. Barker
>> http://www.cac.cornell.edu/barker/
>
>
>
>
> --
> Brandon E. Barker
> http://www.cac.cornell.edu/barker/
>
>



-- 
Brandon E. Barker
http://www.cac.cornell.edu/barker/

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Strategy for pHDF5 collective reads/writs on variable sized communicators

Reply via email to