Re: [Hdf-forum] Writing Chunked Data in Parallel with Compression

Miller, Mark C. Mon, 17 Aug 2015 09:59:25 -0700

I don't think this is yet possible with HDF5. You can only do compression in 
non-parallel settings.


I think there is some work afoot in HDF5 to start supporting certain types of 
compression (fixed rate but variable loss for example) in parallel.

The challenge is that in the presence of compression (in general), each chunk 
winds up being an unpredictable size and so predicting where chunks land in the 
file when they are contiguously packed next to each other requires additional 
communciation that doesn't easily fit within the the current library's design 
contraints. Each processor winds up needing to know about chunk sizes written 
by all other processors.

I've long argued for support for a 'target rate' though where, the library is 
told a compression filter must hit a target compression rate say 2:1. It then 
does all its work assuming each chunk is 1/2 the size of the orig. dataset. If 
some chunks compress more than 2:1, thats ok. They get pad bytes added so they 
are 2:1 (so you don't get any *more* advantage for these lucky chunks). If any 
chunk fails to compress 2:1, the whole write operation fails. But, even that 
latter bit of logic is a little hard to handle in the current parallel library.

Mark


From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of Matthew Clay <[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Saturday, August 15, 2015 3:50 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] Writing Chunked Data in Parallel with Compression

Hi,

I apologize if this is a repeat question. I saw on the HDF5 website that
writing chunked data in parallel with compression is not supported in
version 1.6.3 (https://www.hdfgroup.org/hdf5-quest.html#p5comp). Has
support been added since then?

To give some background, I'll briefly describe our data layout and
needs. We have a 3D cartesian domain decomposed by a 2D MPI process
layout. Each process owns an independent hyperslab of the 3D dataset,
and all hyperslabs have the same dimensions. We would like to write the
data collectively using a chunked layout to a single HDF5 file, with
compression being applied to each chunk. The website mentions that
compression is difficult to do in the case of independent IO. Could it
be possible in this case, when IO is collective, and all hyperslabs are
of equal dimension?

Thanks,
Matthew

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Writing Chunked Data in Parallel with Compression

Reply via email to