Re: [Hdf-forum] Writing Chunked Data in Parallel with Compression

Matthew Clay Mon, 17 Aug 2015 10:36:03 -0700

Mark,

Thank you very much for your response and explanation. We will continueto use parallel IO when we need it, and perhaps use process-based(serial) writing with compression enabled when we are worried about theoverall storage size.


Thanks,
Matthew

On 08/17/2015 12:57 PM, Miller, Mark C. wrote:

I don't think this is yet possible with HDF5. You can only do
compression in non-parallel settings.

I think there is some work afoot in HDF5 to start supporting certain
types of compression (fixed rate but variable loss for example) in parallel.

The challenge is that in the presence of compression (in general), each
chunk winds up being an unpredictable size and so predicting where
chunks land in the file when they are contiguously packed next to each
other requires additional communciation that doesn't easily fit within
the the current library's design contraints. Each processor winds up
needing to know about chunk sizes written by all other processors.

I've long argued for support for a 'target rate' though where, the
library is told a compression filter must hit a target compression rate
say 2:1. It then does all its work assuming each chunk is 1/2 the size
of the orig. dataset. If some chunks compress more than 2:1, thats ok.
They get pad bytes added so they are 2:1 (so you don't get any *more*
advantage for these lucky chunks). If any chunk fails to compress 2:1,
the whole write operation fails. But, even that latter bit of logic is a
little hard to handle in the current parallel library.

Mark


From: Hdf-forum <[email protected]
<mailto:[email protected]>> on behalf of Matthew Clay
<[email protected] <mailto:[email protected]>>
Reply-To: HDF Users Discussion List <[email protected]
<mailto:[email protected]>>
Date: Saturday, August 15, 2015 3:50 PM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: [Hdf-forum] Writing Chunked Data in Parallel with Compression

    Hi,

    I apologize if this is a repeat question. I saw on the HDF5 website
    that
    writing chunked data in parallel with compression is not supported in
    version 1.6.3 (https://www.hdfgroup.org/hdf5-quest.html#p5comp). Has
    support been added since then?

    To give some background, I'll briefly describe our data layout and
    needs. We have a 3D cartesian domain decomposed by a 2D MPI process
    layout. Each process owns an independent hyperslab of the 3D dataset,
    and all hyperslabs have the same dimensions. We would like to write the
    data collectively using a chunked layout to a single HDF5 file, with
    compression being applied to each chunk. The website mentions that
    compression is difficult to do in the case of independent IO. Could it
    be possible in this case, when IO is collective, and all hyperslabs are
    of equal dimension?

    Thanks,
    Matthew

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
    Twitter: https://twitter.com/hdf5



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Writing Chunked Data in Parallel with Compression

Reply via email to