"Hdf-forum on behalf of Ewan Makepeace" wrote:

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file 
store. Generally we have been very pleased with HDF5 - great flexibility in 
data structure, performant, small file size, availability of third party data 
access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding 
that if we (deliberately or accidentally) stop the process while running (and 
writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I 
understand is deprecated). We tend to keep all active pointer objects open for 
the duration of the process that reads or writes them (file, group and dataset 
handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware 
of? Previous contenders were found to be missing support for features we depend 
on. If so will it address the corruption issue?

Apologies but I only ever use HDF5 C interface on Linux-like systems

2) Should we be opening and closing all the entities on every write? I would 
have thought that would dramatically slow access but perhaps not. Guidance?

Well, I think it is best to close datasets, dataspaces, types, and groups as 
soon as possible when you know you no longer need them. That should help to 
minimize memory usage. Also, can you possibly add a call to H5Fflush() 
(https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it 
happens relatively regularly? Can you possibly do something like on Linux where 
you *catch* a signal and then call H5Fclose() on the file as part of the signal 
handler? Are you by chance calling H5dont_atexit() 
(https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) 
somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? 
(fyi...these are all linux-isms and so I don't know if they will be of much use 
to you in your context)

3) Are there any other tips to making the file less susceptible to corruption 
if writing is abandoned unexpectedly?

One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I 
think that work was nearly completed. However, it has since staled on a private 
branch and has yet to have been merged into the mainline of the code. It might 
be worth making a pitch for that if you think it could be useful in this 
context. Again, I am not sure because all my experience is linux-centric.

Hope that helps.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to