Schlumberger-Private
Thanks Landon, Koennecke, Werner and Patrick for the feedback.

I'm suffering exactly the problem described in section 3 in this link:
https://support.hdfgroup.org/HDF5/doc/H5.user/Performance.html

I like the suggestion to use the new SWMR feature, I think I'm not that 
confident in case there's, say, a shutdown exactly during a write operation.

I'll proceed by having a temporary UNcompressed H5 file and then, from time to 
time, I'll have the h5repack tool run on that file to just compress it into 
another permanent file. In my tests, I could open/write/close an uncompressed 
H5 file every x seconds and not suffer from the problem described in the link 
above. The issue really happens with compressed datasets. Another test that I 
did was to run h5repack on a bloated file, the size really goes down to what it 
should be.

As a side note, I don't think my chunk size is too small for the data I have. 
But it's not as big as the chunk cache (1MB).

Thanks and Regards,
Carlos


From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Werner Benger
Sent: Wednesday, October 05, 2016 3:43 AM
To: [email protected]
Subject: [Ext] Re: [Hdf-forum] File size


Hi Carlos,

 use HDF5 1.10 . That one provides the feature to write to a file while it 
always remains readable by another process and it ensures the file will never 
be corrupted. That feature is called SWMR (single write, multiple read) and was 
introduce with 1.10.

Also you may consider using the LZ4 filter for compression instead of the 
internal deflate filter. LZ4 does not compress that strongly as deflate, but 
it's faster by a magnitude, nearly like uncompressed read / write, so it may be 
worth it, especially for time-constraint data I/O. You may also want to 
optimize the chunked layout of the dataset according to your data updates since 
each chunk is compressed on its own.

Cheers,

             Werner

On 05.10.2016 02:08, Carlos Penedo Rocha wrote:
Schlumberger-Private
Hi,

I have a scenario in which my compressed h5 file needs to be updated with new 
data that is coming in every, say, 5 seconds.

Approach #1: keep the file opened and just write data as they come, or write a 
buffer at once.
Approach #2: open the file (RDWR), write the data (or a buffer) and then close 
the file.

Approach #1 is not desirable for my case because if there's any problem 
(outage, etc), then the h5 file will likely get corrupted. Or if I want to have 
a look at the file, I can't because it's still writing (still opened).

Approach #2 is good to address the issue above, BUT I noticed that if I 
open/write/close the file every 5 seconds, the file compression gets really bad 
and the file size goes up big time. Approach 1 doesn't suffer from this problem.

So, my question is: is there an "Approach #3" that gives me the best of the two 
worlds? Less likely to get me a corrupted h5 file and at the same time, a good 
compression rate?

Thanks,
Carlos R.





_______________________________________________

Hdf-forum is for HDF software users discussion.

[email protected]<mailto:[email protected]>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=CwMD-g&c=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw&r=_7cDUs4kpe-OMPlm1XwkIQ&m=psEZsRuiIsfn7FRlC3Fs_rv6fgC2fNPOXCiEB3zsjwM&s=sFbk7TjmHu9po9C4vaqkBmP4G21KV6DUxIT7GBEXjpw&e=>

Twitter: 
https://twitter.com/hdf5<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_hdf5&d=CwMD-g&c=uGuXJ43KPkPWEl2imVFDmZQlhQUET7pVRA2PDIOxgqw&r=_7cDUs4kpe-OMPlm1XwkIQ&m=psEZsRuiIsfn7FRlC3Fs_rv6fgC2fNPOXCiEB3zsjwM&s=dK4L_9-IG4SlCi7AqJoO73SOCUkP60PfH838gPCCvHo&e=>



--

___________________________________________________________________________

Dr. Werner Benger                Visualization Research

Center for Computation & Technology at Louisiana State University (CCT/LSU)

2019  Digital Media Center, Baton Rouge, Louisiana 70803

Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to