Hi Jan,

> On May 23, 2017, at 2:46 AM, Jan Oliver Oelerich 
> <[email protected]> wrote:
> 
> Hello HDF users,
> 
> I am using HDF5 through NetCDF and I recently changed my program so that each 
> MPI process writes its data directly to the output file as opposed to the 
> master process gathering the results and being the only one who does I/O.
> 
> Now I see that my program slows down file systems a lot (of the whole HPC 
> cluster) and I don't really know how to handle I/O. The file system is a high 
> throughput Beegfs system.
> 
> My program uses a hybrid parallelization approach, i.e. work is split into N 
> MPI processes, each of which spawns M worker threads. Currently, I write to 
> the output file from each of the M*N threads, but the writing is guarded by a 
> mutex, so thread-safety shouldn't be a problem. Each writing process is a 
> complete `open file, write, close file` cycle.
> 
> Each write is at a separate region of the HDF5 file, so no chunks are shared 
> among any two processes. The amount of data to be written per process is 
> 1/(M*N) times the size of the whole file.
> 
> Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is the 
> `best practice` regarding parallel file access with HDF5?

        Yes, this is probably the correct way to operate, but generally things 
are much better for this case when collective I/O operations are used.  Are you 
using collective or independent I/O?  (Independent is the default)

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to