Hi, I have an MPI application where each process sample some data. Each process can have an arbitrary number of sampling points (or no points at all). During the simulation each process buffer the sample values in local memory until the buffer is full. At that point each process send its data to designated IO processes, and the IO processes open a HDF5 file, extend a dataset and write the data into the file.
The filespace can be quite compicated, constructed with numerous calls to "h5sselect_hyperslab_f". The memspace is always a simple contiguous block of data. The chunk size is equal to the buffer size, i.e. each time the dataset is extended it is extended by exactly one chunk. The problem is that in some cases, the application hang in h5dwrite_f (Fortran application). I cannot see why. It happens on multiple systems with different MPI implementations, so I believe that the problem is in my application or in the HDF5 library, not in the MPI implementation or on the system level. The problem disappear if I turn off collective IO. I have tried to compile HDF5 with as much error checking as possible (--enable-debug=all --disable-production) and I do not get any errors or warnings from the HDF5 library. I ran the code through TotalView, and got the attached backtrace for the 20 processes that participate in the IO communicator. Does anyone have any idea on how to continue debugging this problem? I currently use HDF5 version 1.8.17. Best regards, Håkon Strandenes
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
