We observe for several hours (or days) and are thinking of storing the
observed data in HDF5 files. In case the process or system crashes, we
do not want to lose data after several hours of observing. 

At the beginning of an observation the HDF5 file and all its groups,
attributes, and datasets are created (and flushed). Thereafter the
datasets get extended during the observation. Now I wonder how much data
can still be read in case of a crash and what can to be done to avoid
loss of data. 

- Can all data be read or can only the data until the latest flush be
read? 
- Can it happen that in case of a crash the file gets corrupted and
nothing can be read, even if regular flushes were done? If so, is there
anything that can be done to be 100% sure the file does not get
corrupted. In particular, can the file be corrupted if the crash happens
during a flush. 
- If regular flushes need to be done, is there a scheme that minimizes
IO? E.g. I can imagine it would be good to flush when a dataset chunk is
full. Maybe there are other considerations. 
- What is the overhead of a flush? I assume that only the data chunks
that were changed get written and probably some index pages. How many
index pages? One per data set? Are data chunks written before index
pages to reduce the risk of file corruption? 

I guess there are other issues I did not think of. 

Cheers, 
Ger 
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to