I don't think this is yet possible with HDF5. You can only do compression in non-parallel settings.
I think there is some work afoot in HDF5 to start supporting certain types of compression (fixed rate but variable loss for example) in parallel. The challenge is that in the presence of compression (in general), each chunk winds up being an unpredictable size and so predicting where chunks land in the file when they are contiguously packed next to each other requires additional communciation that doesn't easily fit within the the current library's design contraints. Each processor winds up needing to know about chunk sizes written by all other processors. I've long argued for support for a 'target rate' though where, the library is told a compression filter must hit a target compression rate say 2:1. It then does all its work assuming each chunk is 1/2 the size of the orig. dataset. If some chunks compress more than 2:1, thats ok. They get pad bytes added so they are 2:1 (so you don't get any *more* advantage for these lucky chunks). If any chunk fails to compress 2:1, the whole write operation fails. But, even that latter bit of logic is a little hard to handle in the current parallel library. Mark From: Hdf-forum <[email protected]<mailto:[email protected]>> on behalf of Matthew Clay <[email protected]<mailto:[email protected]>> Reply-To: HDF Users Discussion List <[email protected]<mailto:[email protected]>> Date: Saturday, August 15, 2015 3:50 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [Hdf-forum] Writing Chunked Data in Parallel with Compression Hi, I apologize if this is a repeat question. I saw on the HDF5 website that writing chunked data in parallel with compression is not supported in version 1.6.3 (https://www.hdfgroup.org/hdf5-quest.html#p5comp). Has support been added since then? To give some background, I'll briefly describe our data layout and needs. We have a 3D cartesian domain decomposed by a 2D MPI process layout. Each process owns an independent hyperslab of the 3D dataset, and all hyperslabs have the same dimensions. We would like to write the data collectively using a chunked layout to a single HDF5 file, with compression being applied to each chunk. The website mentions that compression is difficult to do in the case of independent IO. Could it be possible in this case, when IO is collective, and all hyperslabs are of equal dimension? Thanks, Matthew _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected]<mailto:[email protected]> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
