We observe for several hours (or days) and are thinking of storing the observed data in HDF5 files. In case the process or system crashes, we do not want to lose data after several hours of observing.
At the beginning of an observation the HDF5 file and all its groups, attributes, and datasets are created (and flushed). Thereafter the datasets get extended during the observation. Now I wonder how much data can still be read in case of a crash and what can to be done to avoid loss of data. - Can all data be read or can only the data until the latest flush be read? - Can it happen that in case of a crash the file gets corrupted and nothing can be read, even if regular flushes were done? If so, is there anything that can be done to be 100% sure the file does not get corrupted. In particular, can the file be corrupted if the crash happens during a flush. - If regular flushes need to be done, is there a scheme that minimizes IO? E.g. I can imagine it would be good to flush when a dataset chunk is full. Maybe there are other considerations. - What is the overhead of a flush? I assume that only the data chunks that were changed get written and probably some index pages. How many index pages? One per data set? Are data chunks written before index pages to reduce the risk of file corruption? I guess there are other issues I did not think of. Cheers, Ger
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org