Re: [Hdf-forum] Queuing chunks for compression and writing

Gerd Heber Mon, 16 Mar 2015 13:47:42 -0700

Peter,

1) It is very important to me to create a "standard" hdf file that can be read 
(!) by the standard hdf library without any add-ons for decompression. I need 
this because there are many old versions of our product in the market that rely 
on the standard features of hdf for opening files, i.e. they can open 
gzip-compressed chunks because that is part of hdfs functionality. They would 
not be able to open chunks that I have compressed with my own compression 
algorithm. (FYI: I can not patch these old versions with a new decompression 
filter.)


Correct. If you used a compression algorithm which is not available to your 
clients,
they'd be in trouble when attempting to read those datasets. Newer versions of 
the
library (1.8.11+) support the dynamic loading of filters

http://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf

however, this would require a minimum library version on the client side.

2) My guess would be that I could use gzip for compression (which I will run 
outside of the lib in order to run it in parallel and then I write the chunks 
into the file using H5DOwrite_chunk) and in the hdf file I set the filtermask 
to that for gzip. Then I should be able to read the file with a standard hdf 
library and this will by itself do the decompression?

Correct.

3) I have come to love hdf for it's extremely forgiving implementation. Over 
the years we have fiddled with chunk sizes. We never had to communicate a file 
format change to our customers because the library covered our back. That was 
really nice. What will happen if I write my own compressed chunks? Will I need 
to deliver a decompressor? Will I be able to change chunk sizes without 
breaking backward compatibility?

Yes, you will need to provide a decompressor, either by compiling it into the 
version
of the HDF5 library you distribute with your application, or as a plugin 
(shared library)
to be loaded at runtime.

I'm not sure I understand what you mean by "breaking backward compatibility."
At the API level, H5Dread/write won't see a difference.
A change in chunk size might have an adverse effect on performance, for example,
if you've hard-tuned your application's dataset chunk cache sizes.

Best, G.

________________________________
From: Hdf-forum [[email protected]] on behalf of Gerd Heber 
[[email protected]]
Sent: Monday, March 16, 2015 6:20 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Queuing chunks for compression and writing
Peter, there's an API call that lets you write chunks directly
into the file including chunks which you have compressed outside
the HDF5 filter pipeline. Have a look at:

http://www.hdfgroup.org/HDF5/doc/HL/RM_HDF5Optimized.html#H5DOwrite_chunk

See how fast you can write with H5DOwrite_chunk and then do
a back-of-the-envelope calculation to see how elaborate
a queueing mechanism you want.

G.

From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Peter Majer
Sent: Monday, March 16, 2015 11:53 AM
To: [email protected]<mailto:[email protected]>
Subject: [Hdf-forum] Queuing chunks for compression and writing

Dear All
We have been experiencing and suffering from the fact that writing compressed 
files with hdf is significantly slower than writing uncompressed. I have been 
asking myself for a while whether there is a simple remedy. Would it be 
possible to have two queues of chunks when writing a file, one for compression 
and one for actual writing to achieve the following:


1)      I enqueue N chunks for CompressionAndWriting. They initially enter 
CompressQueue.

2)      The chunks from CompressQueue are concurrently compressed by multiple 
compression threads and subsequently enqueued in a WriteQueue.

3)      A WriteThread sequentially writes all compressed chunks from WriteQueue 
to the file system.

This should allow to keep the WriteThread constantly busy and it should allow 
compressed writing to be faster than uncompressed writing by a factor that is 
more or less identical to the compression rate.

Interfacewise it would be nice to have "StartWrite" and "FinishWrite" methods 
where "Startwrite" simply copies the data into the CompressQueue and returns 
immediately thereafter while FinishWrite would be blocking until the write 
operation for the corresponding chunk has actually completed.

Would this be possible?
Would it be feasible?
Would it be easy?

Thanks, Peter


Dr. Peter Majer
Image Analysis Scientist and Software Architect
Bitplane AG
www.bitplane.com<http://www.bitplane.com/>

This message is intended only for the use of the addressee and may contain 
information that is confidential and/or subject to copyright. If you are not 
the intended recipient, you are hereby notified that any dissemination, 
copying, or redistribution of this message is strictly prohibited. If you have 
received this message in error please delete all copies immediately. Any views 
or opinions presented in this email are solely those of the author and do not 
necessarily represent those of Andor Technology Limited Companies. Andor 
Technology Limited has taken reasonable precautions to ensure that no viruses 
are contained in this email, but does not accept any responsibility once this 
email has been transmitted. Andor Technology Limited is a registered company in 
Northern Ireland, registration number: NI022466. Registered Office: Andor 
Technology, 7 Millennium Way, Springvale Business Park, Belfast, BT12 7AL.

___________________________________________________________________________
Please refer to 
www.oxinst.com/email-statement<http://www.oxinst.com/email-statement> for 
regulatory information.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Queuing chunks for compression and writing

Reply via email to