Hello HDF users,
I am using HDF5 through NetCDF and I recently changed my program so that
each MPI process writes its data directly to the output file as opposed
to the master process gathering the results and being the only one who
does I/O.
Now I see that my program slows down file systems a lot (of the whole
HPC cluster) and I don't really know how to handle I/O. The file system
is a high throughput Beegfs system.
My program uses a hybrid parallelization approach, i.e. work is split
into N MPI processes, each of which spawns M worker threads. Currently,
I write to the output file from each of the M*N threads, but the writing
is guarded by a mutex, so thread-safety shouldn't be a problem. Each
writing process is a complete `open file, write, close file` cycle.
Each write is at a separate region of the HDF5 file, so no chunks are
shared among any two processes. The amount of data to be written per
process is 1/(M*N) times the size of the whole file.
Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is
the `best practice` regarding parallel file access with HDF5?
Thank you and best regards,
Jan Oliver Oelerich
--
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg
Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : [email protected]
Web : http://academics.oelerich.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5