Based on this comment in the HDF5 docs, I would think it would be acceptable for a hypserlab selection to go beyond the extent of what was allocated (written to) the dataset, at least if chunking is used: "Fill-values are only used for chunked storage datasets when an unallocated chunk is read from."
I specified a fill value now but this didn't seem to make a difference; do hyperslabs have some additional conditions that prevent fill values from working or am I doing something else wrong? I've tested stride, count, etc with H5Dwrite - this seems to work fine. I use the same values for H5Dread. H5Dread also works if mpi_size doesn't change between runs. But it would be nice if I could find out how to make this more flexible between runs so mpi_size would have to be fixed. Thanks, On Fri, May 29, 2015 at 11:17 AM, Brandon Barker <[email protected] > wrote: > In the above, I assumed I can't change the arguments to > H5Sselect_hyperslab (at least not easily), so I tried to fix the issue > by changing the extent size using a call to H5Dset_extent, with the > further assumption that a fill value would be used if I try to read > beyond the end of the data stored in the dataset ... is this wrong? > > On Thu, May 28, 2015 at 4:18 PM, Brandon Barker > <[email protected]> wrote: > > Thanks Elena, > > > > Apologies below for using "chunk" in a different way (e.g. chunk_counter; > > MPI_CHUNK_SIZE) than it is used in HDF5; perhaps I should call them > "slabs". > > > > Code from the checkpoint procedure (seems to work): > > // dataset and memoryset dimensions (just 1d here) > > hsize_t dimsm[] = {chunk_counter * MPI_CHUNK_SIZE}; > > hsize_t dimsf[] = {dimsm[0] * mpi_size}; > > hsize_t maxdims[] = {H5S_UNLIMITED}; > > hsize_t chunkdims[] = {1}; > > // hyperslab offset and size info */ > > hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE}; > > hsize_t count[] = {chunk_counter}; > > hsize_t block[] = {MPI_CHUNK_SIZE}; > > hsize_t stride[] = {MPI_CHUNK_SIZE * mpi_size}; > > > > dset_plist_create_id = H5Pcreate (H5P_DATASET_CREATE); > > status = H5Pset_chunk (dset_plist_create_id, RANK, chunkdims); > > dset_id = H5Dcreate (file_id, DATASETNAME, big_int_h5, filespace, > > H5P_DEFAULT, dset_plist_create_id, H5P_DEFAULT); > > assert(dset_id != HDF_FAIL); > > > > > > H5Sselect_hyperslab(filespace, H5S_SELECT_SET, > > start, stride, count, block); > > > > > > > > Code from the restore procedure (this is where the problem is): > > > > // dataset and memoryset dimensions (just 1d here) > > hsize_t dimsm[1]; > > hsize_t dimsf[1]; > > // hyperslab offset and size info > > hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE}; > > hsize_t count[1]; > > hsize_t block[] = {MPI_CHUNK_SIZE}; > > hsize_t stride[] = {MPI_CHUNK_SIZE * mpi_size}; > > > > > > // > > // Update dimensions and dataspaces as appropriate > > // > > chunk_counter = get_restore_chunk_counter(dimsf[0]); // Number of > chunks > > previously used plus enough new chunks to be divisible by mpi_size. > > count[0] = chunk_counter; > > dimsm[0] = chunk_counter * MPI_CHUNK_SIZE; > > dimsf[0] = dimsm[0] * mpi_size; > > status = H5Dset_extent(dset_id, dimsf); > > assert(status != HDF_FAIL); > > > > // > > // Create the memspace for the dataset and allocate data for it > > // > > memspace = H5Screate_simple(RANK, dimsm, NULL); > > perf_diffs = alloc_and_init(perf_diffs, dimsm[0]); > > > > H5Sselect_hyperslab(filespace, H5S_SELECT_SET, start, stride, count, > > block); > > > > > > Complete example code: > > > https://github.com/cornell-comp-internal/CR-demos/blob/bc507264fe4040d817a2e9603dace0dc06585015/demos/pHDF5/perfectNumbers.c > > > > > > Best, > > > > > > > > The complete example is here: > > > > On Thu, May 28, 2015 at 3:43 PM, Elena Pourmal <[email protected]> > > wrote: > >> > >> Hi Brandon, > >> > >> The error message indicates that a hyperslab selection goes beyond > dataset > >> extent. > >> > >> Please make sure that you are using the correct values for the start, > >> stride, count and block parameters in the H5Sselect_hyperslab call (if > you > >> use it!). It will help if you provide an excerpt from your code that > >> selects hyperslabs for each process. > >> > >> Elena > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> Elena Pourmal The HDF Group http://hdfgroup.org > >> 1800 So. Oak St., Suite 203, Champaign IL 61820 > >> 217.531.6112 > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > >> > >> > >> > >> On May 28, 2015, at 1:46 PM, Brandon Barker <[email protected] > > > >> wrote: > >> > >> I believe I've gotten a bit closer by using chunked datasets, but I'm > now > >> not sure how to get past this: > >> > >> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 ./perfectNumbers > >> m, f, count,: 840, 1680, 84 > >> m, f, count,: 840, 1680, 84 > >> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1: > >> #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not > >> within extent > >> major: Dataspace > >> minor: Out of range > >> perfectNumbers: perfectNumbers.c:399: restore: Assertion `status != -1' > >> failed. > >> > -------------------------------------------------------------------------- > >> mpirun noticed that process rank 1 with PID 28420 on node > >> euca-128-84-11-180 exited on signal 11 (Segmentation fault). > >> > -------------------------------------------------------------------------- > >> > >> > >> (m,f,count) represent the memory space and dataspace lengths and the > count > >> of strided segments to be read in; prior to using set extents as > follows, I > >> would get the error when f was not a multiple of m > >> dimsf[0] = dimsm[0] * mpi_size; > >> H5Dset_extent(dset_id, dimsf); > >> > >> Now that I am using these, I note that it doesn't seem to have helped > the > >> issue, so there must be something else I still need to do. > >> > >> Incidentally, I was looking at this example and am not sure what the > point > >> of the following code is since rank_chunk is never used: > >> if (H5D_CHUNKED == H5Pget_layout (prop)) > >> rank_chunk = H5Pget_chunk (prop, rank, chunk_dimsr); > >> > >> I guess it is just to demonstrate the function call of H5Pget_chunk? > >> > >> On Thu, May 28, 2015 at 10:27 AM, Brandon Barker > >> <[email protected]> wrote: > >>> > >>> Hi All, > >>> > >>> I have fixed (and pushed the fix for) one bug that related to an > >>> improperly defined count in the restore function. I still have issues > for m > >>> != n: > >>> > >>> #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not > >>> within extent > >>> major: Dataspace > >>> minor: Out of range > >>> > >>> I believe this is indicative of me needing to use chunked datasets so > >>> that my dataset can grow in size dynamically. > >>> > >>> On Wed, May 27, 2015 at 5:03 PM, Brandon Barker > >>> <[email protected]> wrote: > >>>> > >>>> Hi All, > >>>> > >>>> I've been learning pHDF5 by way of developing a toy application that > >>>> checkpoints and restores its state. The restore function was the last > to be > >>>> implemented, but I realized after doing so that I have an issue: > since each > >>>> process has strided blocks of data that it is responsible for, the > number of > >>>> blocks of data saved during one run may not be evenly distributed > among > >>>> processes in another run, as the mpi_size of the latter run may not > evenly > >>>> divide the total number of blocks. > >>>> > >>>> I was hoping that a fill value might save me here, and just read in 0s > >>>> if I try reading beyond the end of the dataset. Although, I believe I > did > >>>> see a page noting that this isn't possible for contiguous datasets. > >>>> > >>>> The good news is that since I'm working with 1-dimenional data, it is > >>>> fairly easy to refactor relevant code. > >>>> > >>>> The error I get emits this message: > >>>> > >>>> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 perfectNumbers > >>>> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0: > >>>> #000: ../../src/H5Dio.c line 179 in H5Dread(): can't read data > >>>> major: Dataset > >>>> minor: Read failed > >>>> #001: ../../src/H5Dio.c line 446 in H5D__read(): src and dest data > >>>> spaces have different sizes > >>>> major: Invalid arguments to routine > >>>> minor: Bad value > >>>> perfectNumbers: perfectNumbers.c:382: restore: Assertion `status != > -1' > >>>> failed. > >>>> > >>>> > -------------------------------------------------------------------------- > >>>> mpirun noticed that process rank 0 with PID 3717 on node > >>>> euca-128-84-11-180 exited on signal 11 (Segmentation fault). > >>>> > >>>> Here is the offending line in the restore function; you can observe > the > >>>> checkpoint function to see how things are written out to disk. > >>>> > >>>> General pointers are appreciated as well - to paraphrase the problem > >>>> more simply: I have a distributed (strided) array I write out to disk > as a > >>>> dataset among n processes, and when I restart the program, I may want > to > >>>> divvy up the data among m processes in similar datastructures as > before, but > >>>> now m != n. Actually, my problem may be different than just this, > since I > >>>> seem to get the same issue even when m == n ... hmm. > >>>> > >>>> Thanks, > >>>> -- > >>>> Brandon E. Barker > >>>> http://www.cac.cornell.edu/barker/ > >>> > >>> > >>> > >>> > >>> -- > >>> Brandon E. Barker > >>> http://www.cac.cornell.edu/barker/ > >> > >> > >> > >> > >> -- > >> Brandon E. Barker > >> http://www.cac.cornell.edu/barker/ > >> _______________________________________________ > >> Hdf-forum is for HDF software users discussion. > >> [email protected] > >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >> Twitter: https://twitter.com/hdf5 > >> > >> > > > > > > > > -- > > Brandon E. Barker > > http://www.cac.cornell.edu/barker/ > > > > -- > Brandon E. Barker > http://www.cac.cornell.edu/barker/ > -- Brandon E. Barker http://www.cac.cornell.edu/barker/
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
