Hi Mohamad,

On Thu, Jul 02, 2015 at 01:56:40PM +0000, Mohamad Chaarawi wrote:
> I entered an issue for parallel support and improvements for
> H5Ocopy() in our Jira database (HDFFV-9435), but to be honest, I am
> not sure if we will have time to fix it for parallel unless someone
> funds it, since this isn't a high priority feature at the moment.

Thank you for confirming my guess. I will keep this in mind in case
I acquire funding of my own for a project using parallel HDF5.

I use H5Ocopy to make atomic snapshots of output datasets during a
simulation. The datasets have chunked layout with time-varying
data and grow over the course of the simulation. If the simulation
is interrupted, the output file is unreadable since HDF5 does not
implement metadata journaling (yet?).

To make a consistent snapshot, I create another HDF5 file with a
temporary filename. All output datasets are copied to that snapshot
file. Then the file is flushed to storage with H5Fflush. When using
MPI, this implicitly invokes MPI_File_sync. Otherwise, in the serial
case, fsync must be invoked on the file descriptor retrieved with
H5Fget_vfd_handle. After the data has been written to storage, the
snapshot file is renamed to a non-temporary filename, which overwrites
the previous snapshot file.

Since H5Ocopy is a collective call, if the output file is opened by
all processes, so must the snapshot file. For now I worked around the
issue by keeping the per-node output data in memory until the end of
the simulation, thus avoiding H5Ocopy entirely.

Regards,
Peter

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to