[Hdf-forum] Parallel file access recommendation

Jan Oliver Oelerich Tue, 23 May 2017 05:49:52 -0700

Hello HDF users,

I am using HDF5 through NetCDF and I recently changed my program so thateach MPI process writes its data directly to the output file as opposedto the master process gathering the results and being the only one whodoes I/O.

Now I see that my program slows down file systems a lot (of the wholeHPC cluster) and I don't really know how to handle I/O. The file systemis a high throughput Beegfs system.

My program uses a hybrid parallelization approach, i.e. work is splitinto N MPI processes, each of which spawns M worker threads. Currently,I write to the output file from each of the M*N threads, but the writingis guarded by a mutex, so thread-safety shouldn't be a problem. Eachwriting process is a complete `open file, write, close file` cycle.

Each write is at a separate region of the HDF5 file, so no chunks areshared among any two processes. The amount of data to be written perprocess is 1/(M*N) times the size of the whole file.

Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What isthe `best practice` regarding parallel file access with HDF5?


Thank you and best regards,
Jan Oliver Oelerich



--
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg

Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : [email protected]
Web  : http://academics.oelerich.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

[Hdf-forum] Parallel file access recommendation

Reply via email to