Hello HDF users,

I am using HDF5 through NetCDF and I recently changed my program so that each MPI process writes its data directly to the output file as opposed to the master process gathering the results and being the only one who does I/O.

Now I see that my program slows down file systems a lot (of the whole HPC cluster) and I don't really know how to handle I/O. The file system is a high throughput Beegfs system.

My program uses a hybrid parallelization approach, i.e. work is split into N MPI processes, each of which spawns M worker threads. Currently, I write to the output file from each of the M*N threads, but the writing is guarded by a mutex, so thread-safety shouldn't be a problem. Each writing process is a complete `open file, write, close file` cycle.

Each write is at a separate region of the HDF5 file, so no chunks are shared among any two processes. The amount of data to be written per process is 1/(M*N) times the size of the whole file.

Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is the `best practice` regarding parallel file access with HDF5?

Thank you and best regards,
Jan Oliver Oelerich



--
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg

Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : [email protected]
Web  : http://academics.oelerich.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to