As much as I love hdf5 (and pytables), I find that it becomes
increasingly unusable when storing large amounts of data, with
potentially troublesome code.
I have already learned the hard way to never store original experimental
data in any database that might be opened with write access; and now I
am finding that storing several days worth of simulation data in hdf5
isnt quite feasible either. Perhaps itd be fine when my code is all done
and bug free; for now, it crashes frequently. Thats part of development,
but id like to be able to do development without losing days worth of
data at a time, AND use hdf5.
My question then is: what are best practices for dealing with these
kinds of situations? One thing I am doing at the moment is splitting my
data over several different .h5 files, so writing to one table can not
take my whole dataset down with it. It is unfortunate though, that
standard OS file systems are more robust than hdf5; id rather see it the
other way around.
I understand that there isnt much one can do about a program crashing in
the middle of a binary tree update; that is not going to be pretty. But
I could envision a rather simple solution to that; just keep one or more
fully redundant metadata structures in memory, and only ever write to
one at the same time. If one becomes corrupted, at least you still have
all your data since your last flush available. I could not care less for
the extra disk space overhead, but in case anyone does, it should be
easy to make the number of histories of the metadata maintained optional.
Is there already such functionality that I have not noticed, is it (or
should it) be planned functionality, or am I missing any other
techniques for dealing with these type of situations?
Thank you for your input,
Eelco Hoogendoorn
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org