Mark,
Thank you very much for your response and explanation. We will continue
to use parallel IO when we need it, and perhaps use process-based
(serial) writing with compression enabled when we are worried about the
overall storage size.
Thanks,
Matthew
On 08/17/2015 12:57 PM, Miller, Mark C. wrote:
I don't think this is yet possible with HDF5. You can only do
compression in non-parallel settings.
I think there is some work afoot in HDF5 to start supporting certain
types of compression (fixed rate but variable loss for example) in parallel.
The challenge is that in the presence of compression (in general), each
chunk winds up being an unpredictable size and so predicting where
chunks land in the file when they are contiguously packed next to each
other requires additional communciation that doesn't easily fit within
the the current library's design contraints. Each processor winds up
needing to know about chunk sizes written by all other processors.
I've long argued for support for a 'target rate' though where, the
library is told a compression filter must hit a target compression rate
say 2:1. It then does all its work assuming each chunk is 1/2 the size
of the orig. dataset. If some chunks compress more than 2:1, thats ok.
They get pad bytes added so they are 2:1 (so you don't get any *more*
advantage for these lucky chunks). If any chunk fails to compress 2:1,
the whole write operation fails. But, even that latter bit of logic is a
little hard to handle in the current parallel library.
Mark
From: Hdf-forum <[email protected]
<mailto:[email protected]>> on behalf of Matthew Clay
<[email protected] <mailto:[email protected]>>
Reply-To: HDF Users Discussion List <[email protected]
<mailto:[email protected]>>
Date: Saturday, August 15, 2015 3:50 PM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: [Hdf-forum] Writing Chunked Data in Parallel with Compression
Hi,
I apologize if this is a repeat question. I saw on the HDF5 website
that
writing chunked data in parallel with compression is not supported in
version 1.6.3 (https://www.hdfgroup.org/hdf5-quest.html#p5comp). Has
support been added since then?
To give some background, I'll briefly describe our data layout and
needs. We have a 3D cartesian domain decomposed by a 2D MPI process
layout. Each process owns an independent hyperslab of the 3D dataset,
and all hyperslabs have the same dimensions. We would like to write the
data collectively using a chunked layout to a single HDF5 file, with
compression being applied to each chunk. The website mentions that
compression is difficult to do in the case of independent IO. Could it
be possible in this case, when IO is collective, and all hyperslabs are
of equal dimension?
Thanks,
Matthew
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5