This text. . . "Fill-values are only used for chunked storage datasets when an unallocated chunk is read from."
isn't consistent with what I think a 'Fill-value' should do/mean. I would have expected a fill-value to be used *anywhere* data is read that wasn't also already written. In other words, the fill-value behavior extends to *inside* of any partially written to chunk as well, not just wholly UNwritten chunks. I think that is more or less what Fill-value means in netCDF for example. My apologies for interrupting in the middle of the thread with this comment/inquiry. And, yes, I could probably write code to confirm the behavior. But, I would like to avoid doing that if someone knows for sure. Does HDF5's sense of a 'fill value' apply only to wholly UNwritten chunks? Or, does it extend to any entry in the dataset that hasn't already been written? Mark From: Brandon Barker <[email protected]<mailto:[email protected]>> Reply-To: HDF Users Discussion List <[email protected]<mailto:[email protected]>> Date: Tuesday, June 2, 2015 11:53 AM To: Brandon Barker <[email protected]<mailto:[email protected]>> Cc: HDF Users Discussion List <[email protected]<mailto:[email protected]>> Subject: Re: [Hdf-forum] Strategy for pHDF5 collective reads/writs on variable sized communicators Based on this comment in the HDF5 docs, I would think it would be acceptable for a hypserlab selection to go beyond the extent of what was allocated (written to) the dataset, at least if chunking is used: "Fill-values are only used for chunked storage datasets when an unallocated chunk is read from." I specified a fill value now but this didn't seem to make a difference; do hyperslabs have some additional conditions that prevent fill values from working or am I doing something else wrong? I've tested stride, count, etc with H5Dwrite - this seems to work fine. I use the same values for H5Dread. H5Dread also works if mpi_size doesn't change between runs. But it would be nice if I could find out how to make this more flexible between runs so mpi_size would have to be fixed. Thanks, On Fri, May 29, 2015 at 11:17 AM, Brandon Barker <[email protected]<mailto:[email protected]>> wrote: In the above, I assumed I can't change the arguments to H5Sselect_hyperslab (at least not easily), so I tried to fix the issue by changing the extent size using a call to H5Dset_extent, with the further assumption that a fill value would be used if I try to read beyond the end of the data stored in the dataset ... is this wrong? On Thu, May 28, 2015 at 4:18 PM, Brandon Barker <[email protected]<mailto:[email protected]>> wrote: > Thanks Elena, > > Apologies below for using "chunk" in a different way (e.g. chunk_counter; > MPI_CHUNK_SIZE) than it is used in HDF5; perhaps I should call them "slabs". > > Code from the checkpoint procedure (seems to work): > // dataset and memoryset dimensions (just 1d here) > hsize_t dimsm[] = {chunk_counter * MPI_CHUNK_SIZE}; > hsize_t dimsf[] = {dimsm[0] * mpi_size}; > hsize_t maxdims[] = {H5S_UNLIMITED}; > hsize_t chunkdims[] = {1}; > // hyperslab offset and size info */ > hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE}; > hsize_t count[] = {chunk_counter}; > hsize_t block[] = {MPI_CHUNK_SIZE}; > hsize_t stride[] = {MPI_CHUNK_SIZE * mpi_size}; > > dset_plist_create_id = H5Pcreate (H5P_DATASET_CREATE); > status = H5Pset_chunk (dset_plist_create_id, RANK, chunkdims); > dset_id = H5Dcreate (file_id, DATASETNAME, big_int_h5, filespace, > H5P_DEFAULT, dset_plist_create_id, H5P_DEFAULT); > assert(dset_id != HDF_FAIL); > > > H5Sselect_hyperslab(filespace, H5S_SELECT_SET, > start, stride, count, block); > > > > Code from the restore procedure (this is where the problem is): > > // dataset and memoryset dimensions (just 1d here) > hsize_t dimsm[1]; > hsize_t dimsf[1]; > // hyperslab offset and size info > hsize_t start[] = {mpi_rank * MPI_CHUNK_SIZE}; > hsize_t count[1]; > hsize_t block[] = {MPI_CHUNK_SIZE}; > hsize_t stride[] = {MPI_CHUNK_SIZE * mpi_size}; > > > // > // Update dimensions and dataspaces as appropriate > // > chunk_counter = get_restore_chunk_counter(dimsf[0]); // Number of chunks > previously used plus enough new chunks to be divisible by mpi_size. > count[0] = chunk_counter; > dimsm[0] = chunk_counter * MPI_CHUNK_SIZE; > dimsf[0] = dimsm[0] * mpi_size; > status = H5Dset_extent(dset_id, dimsf); > assert(status != HDF_FAIL); > > // > // Create the memspace for the dataset and allocate data for it > // > memspace = H5Screate_simple(RANK, dimsm, NULL); > perf_diffs = alloc_and_init(perf_diffs, dimsm[0]); > > H5Sselect_hyperslab(filespace, H5S_SELECT_SET, start, stride, count, > block); > > > Complete example code: > https://github.com/cornell-comp-internal/CR-demos/blob/bc507264fe4040d817a2e9603dace0dc06585015/demos/pHDF5/perfectNumbers.c > > > Best, > > > > The complete example is here: > > On Thu, May 28, 2015 at 3:43 PM, Elena Pourmal > <[email protected]<mailto:[email protected]>> > wrote: >> >> Hi Brandon, >> >> The error message indicates that a hyperslab selection goes beyond dataset >> extent. >> >> Please make sure that you are using the correct values for the start, >> stride, count and block parameters in the H5Sselect_hyperslab call (if you >> use it!). It will help if you provide an excerpt from your code that >> selects hyperslabs for each process. >> >> Elena >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Elena Pourmal The HDF Group http://hdfgroup.org >> 1800 So. Oak St., Suite 203, Champaign IL 61820 >> 217.531.6112<tel:217.531.6112> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> >> On May 28, 2015, at 1:46 PM, Brandon Barker >> <[email protected]<mailto:[email protected]>> >> wrote: >> >> I believe I've gotten a bit closer by using chunked datasets, but I'm now >> not sure how to get past this: >> >> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 ./perfectNumbers >> m, f, count,: 840, 1680, 84 >> m, f, count,: 840, 1680, 84 >> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1: >> #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not >> within extent >> major: Dataspace >> minor: Out of range >> perfectNumbers: perfectNumbers.c:399: restore: Assertion `status != -1' >> failed. >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 1 with PID 28420 on node >> euca-128-84-11-180 exited on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> >> >> (m,f,count) represent the memory space and dataspace lengths and the count >> of strided segments to be read in; prior to using set extents as follows, I >> would get the error when f was not a multiple of m >> dimsf[0] = dimsm[0] * mpi_size; >> H5Dset_extent(dset_id, dimsf); >> >> Now that I am using these, I note that it doesn't seem to have helped the >> issue, so there must be something else I still need to do. >> >> Incidentally, I was looking at this example and am not sure what the point >> of the following code is since rank_chunk is never used: >> if (H5D_CHUNKED == H5Pget_layout (prop)) >> rank_chunk = H5Pget_chunk (prop, rank, chunk_dimsr); >> >> I guess it is just to demonstrate the function call of H5Pget_chunk? >> >> On Thu, May 28, 2015 at 10:27 AM, Brandon Barker >> <[email protected]<mailto:[email protected]>> wrote: >>> >>> Hi All, >>> >>> I have fixed (and pushed the fix for) one bug that related to an >>> improperly defined count in the restore function. I still have issues for m >>> != n: >>> >>> #000: ../../src/H5Dio.c line 158 in H5Dread(): selection+offset not >>> within extent >>> major: Dataspace >>> minor: Out of range >>> >>> I believe this is indicative of me needing to use chunked datasets so >>> that my dataset can grow in size dynamically. >>> >>> On Wed, May 27, 2015 at 5:03 PM, Brandon Barker >>> <[email protected]<mailto:[email protected]>> wrote: >>>> >>>> Hi All, >>>> >>>> I've been learning pHDF5 by way of developing a toy application that >>>> checkpoints and restores its state. The restore function was the last to be >>>> implemented, but I realized after doing so that I have an issue: since each >>>> process has strided blocks of data that it is responsible for, the number >>>> of >>>> blocks of data saved during one run may not be evenly distributed among >>>> processes in another run, as the mpi_size of the latter run may not evenly >>>> divide the total number of blocks. >>>> >>>> I was hoping that a fill value might save me here, and just read in 0s >>>> if I try reading beyond the end of the dataset. Although, I believe I did >>>> see a page noting that this isn't possible for contiguous datasets. >>>> >>>> The good news is that since I'm working with 1-dimenional data, it is >>>> fairly easy to refactor relevant code. >>>> >>>> The error I get emits this message: >>>> >>>> [brandon@euca-128-84-11-180 pHDF5]$ mpirun -n 2 perfectNumbers >>>> HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0: >>>> #000: ../../src/H5Dio.c line 179 in H5Dread(): can't read data >>>> major: Dataset >>>> minor: Read failed >>>> #001: ../../src/H5Dio.c line 446 in H5D__read(): src and dest data >>>> spaces have different sizes >>>> major: Invalid arguments to routine >>>> minor: Bad value >>>> perfectNumbers: perfectNumbers.c:382: restore: Assertion `status != -1' >>>> failed. >>>> >>>> -------------------------------------------------------------------------- >>>> mpirun noticed that process rank 0 with PID 3717 on node >>>> euca-128-84-11-180 exited on signal 11 (Segmentation fault). >>>> >>>> Here is the offending line in the restore function; you can observe the >>>> checkpoint function to see how things are written out to disk. >>>> >>>> General pointers are appreciated as well - to paraphrase the problem >>>> more simply: I have a distributed (strided) array I write out to disk as a >>>> dataset among n processes, and when I restart the program, I may want to >>>> divvy up the data among m processes in similar datastructures as before, >>>> but >>>> now m != n. Actually, my problem may be different than just this, since I >>>> seem to get the same issue even when m == n ... hmm. >>>> >>>> Thanks, >>>> -- >>>> Brandon E. Barker >>>> http://www.cac.cornell.edu/barker/ >>> >>> >>> >>> >>> -- >>> Brandon E. Barker >>> http://www.cac.cornell.edu/barker/ >> >> >> >> >> -- >> Brandon E. Barker >> http://www.cac.cornell.edu/barker/ >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected]<mailto:[email protected]> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 >> >> > > > > -- > Brandon E. Barker > http://www.cac.cornell.edu/barker/ -- Brandon E. Barker http://www.cac.cornell.edu/barker/ -- Brandon E. Barker http://www.cac.cornell.edu/barker/
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
