Re: [Hdf-forum] File corruption and hdf5 design considerations

Quincey Koziol Mon, 13 Aug 2012 09:13:30 -0700

Hi Eelco,
        We are working on two features for the 1.10.0 release that should 
address the file corruption issue:  journaling changes to the HDF5 file, and 
writing updates to the file in a specific order that prevents the file from 
being corrupted if the application crashes.  Journaling has no space overhead, 
but may perform slower, while ordered updates should perform as normal for 
HDF5, but will have some space overhead.  We are trying to get both of these 
features ready by around November.


        Quincey

On Aug 11, 2012, at 11:31 AM, Eelco Hoogendoorn wrote:

> As much as I love hdf5 (and pytables), I find that it becomes increasingly 
> unusable when storing large amounts of data, with potentially troublesome 
> code.
> 
> I have already learned the hard way to never store original experimental data 
> in any database that might be opened with write access; and now I am finding 
> that storing several days worth of simulation data in hdf5 isnt quite 
> feasible either. Perhaps itd be fine when my code is all done and bug free; 
> for now, it crashes frequently. Thats part of development, but id like to be 
> able to do development without losing days worth of data at a time, AND use 
> hdf5.
> 
> My question then is: what are best practices for dealing with these kinds of 
> situations? One thing I am doing at the moment is splitting my data over 
> several different .h5 files, so writing to one table can not take my whole 
> dataset down with it. It is unfortunate though, that standard OS file systems 
> are more robust than hdf5; id rather see it the other way around.
> 
> I understand that there isnt much one can do about a program crashing in the 
> middle of a binary tree update; that is not going to be pretty. But I could 
> envision a rather simple solution to that; just keep one or more fully 
> redundant metadata structures in memory, and only ever write to one at the 
> same time. If one becomes corrupted, at least you still have all your data 
> since your last flush available. I could not care less for the extra disk 
> space overhead, but in case anyone does, it should be easy to make the number 
> of histories of the metadata maintained optional.
> 
> Is there already such functionality that I have not noticed, is it (or should 
> it) be planned functionality, or am I missing any other techniques for 
> dealing with these type of situations?
> 
> Thank you for your input,
> Eelco Hoogendoorn
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] File corruption and hdf5 design considerations

Reply via email to