Hi,

I've had a similar experience with this writing streams of 2D data, and I've 
also noticed that performance is much slower if I don't write whole chunks at a 
time. I would have thought (assuming you've sized the chunk cache suitably) 
that each 1000x200x1 write would gradually fill up a 1000x200x50 chunk, then 
some time later that whole chunk would be deflated once when its evicted from 
the cache and written to disk once. But based on the performance I see I can 
only guess it's not working like this, so I also just buffer whole chunks 
myself.

Dan


-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Patrick Vacek
Sent: 22 March 2016 20:55
To: [email protected]
Subject: [Hdf-forum] Deflate and partial chunk writes

Hello!

I've found an interesting situation that seems like something of a bug to me. 
I've figured out how to work around it, but I wanted to bring it up in case it 
comes up for anyone else.

I use the Fortran API, and I typically create HDF5 datasets with large, 
multidimensional chunk sizes, but I only write part of that chunk size at any 
given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only 
write 1000 x 200 x 1 elements at a time. This seems to work fine, although on 
networked filesystems, I sometimes notice that my application is I/O-limited. 
The solution is to buffer our HDF5 writes locally and then write a full chunk 
at a time.

Recently, I decided to try out the deflate/zlib filter. I've noticed that when 
I buffer the data locally and write a full chunk at a time, it works 
beautifully and compresses nicely. But if I do not write a full chunk at a time 
(say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I 
examine it with h5stat, I see that the 'raw data' size is about what I'd expect 
(tens of megabytes), but the 'unaccounted space' size is a few gigabytes.

 From what I can tell, it looks like the deflate filter is applied to the full 
chunk, despite that I haven't written the whole thing yet, and as I add more to 
it, it doesn't overwrite, remove, or re-optimize the parts it has already 
written. It's as if it deflates a full chunk for each small-ish write. I 
haven't seen anything in the documentation or the forum to confirm this, but 
this seems like a problem. If it isn't something easily addressed, I think 
there should perhaps be a warning about this inefficiency in the documentation 
for the deflate filter.

Thanks!

--
Patrick Vacek
Engineering Scientist Associate
Applied Research Labs, University of Texas


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to