Can you try it with 1.10.1 and see if you still have an issue. Scot
> On May 19, 2017, at 1:11 PM, Quincey Koziol <[email protected]> wrote: > > Hi Håkon, > >> On May 19, 2017, at 10:01 AM, Håkon Strandenes <[email protected]> wrote: >> >> (sorry, forgot to cc mailing list in prev. mail) >> >> A standalone test program would be quite an effort, but I will think about >> it. I know that at least all simple test cases pass, so I need a >> "complicated" problem to generate the error. > > Yeah, that’s usually the case with these kind of issues. :-/ > > >> One thing I wonder about is: >> Is the requirements for collective IO in this document: >> https://support.hdfgroup.org/HDF5/PHDF5/parallelhdf5hints.pdf >> still valid and accurate? >> >> The reason I ask is that my filespace is complicated. Each IO process create >> the filespace with MANY calls to select_hyperslab. Hence it is neither >> regular nor singular, and according to the above mentioned document the HDF5 >> library should not be able to do collective IO in this case. Still, it seems >> like it hangs in some collective writing routine. >> >> Am I onto something? Could this be a problem? > > Fortunately, we’ve expanded the feature set for collective I/O now and > it supports arbitrary selections on chunked datasets. There’s always the > chance for a bug of course, but it would have to be very unusual, since we > are pretty thorough about the regression testing… > > Quincey > > >> Regards, >> Håkon >> >> >> On 05/19/2017 04:46 PM, Quincey Koziol wrote: >>> Hmm, sounds like you’ve varied a lot of things, which is good. But, the >>> constant seems to be your code now. :-/ Can you replicate the error with a >>> small standalone C test program? >>> Quincey >>>> On May 19, 2017, at 7:43 AM, Håkon Strandenes <[email protected]> wrote: >>>> >>>> The behavior is there both with SGI MPT and Intel MPI. I can try OpenMPI >>>> as well, but that is not as well tested on the systems we are using as the >>>> previously mentioned ones. >>>> >>>> I also tested and can confirm that the problem is there as well with HDF5 >>>> 1.10.1. >>>> >>>> Regards, >>>> Håkon >>>> >>>> >>>> >>>> On 05/19/2017 04:29 PM, Quincey Koziol wrote: >>>>> Hi Håkon, >>>>> Actually, given this behavior, it’s reasonably possible that you have >>>>> found a bug in the MPI implementation that you have, so I wouldn’t rule >>>>> that out. What implementation and version of MPI are you using? >>>>> Quincey >>>>>> On May 19, 2017, at 4:14 AM, Håkon Strandenes <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have an MPI application where each process sample some data. Each >>>>>> process can have an arbitrary number of sampling points (or no points at >>>>>> all). During the simulation each process buffer the sample values in >>>>>> local memory until the buffer is full. At that point each process send >>>>>> its data to designated IO processes, and the IO processes open a HDF5 >>>>>> file, extend a dataset and write the data into the file. >>>>>> >>>>>> The filespace can be quite compicated, constructed with numerous calls >>>>>> to "h5sselect_hyperslab_f". The memspace is always a simple contiguous >>>>>> block of data. The chunk size is equal to the buffer size, i.e. each >>>>>> time the dataset is extended it is extended by exactly one chunk. >>>>>> >>>>>> The problem is that in some cases, the application hang in h5dwrite_f >>>>>> (Fortran application). I cannot see why. It happens on multiple systems >>>>>> with different MPI implementations, so I believe that the problem is in >>>>>> my application or in the HDF5 library, not in the MPI implementation or >>>>>> on the system level. >>>>>> >>>>>> The problem disappear if I turn off collective IO. >>>>>> >>>>>> I have tried to compile HDF5 with as much error checking as possible >>>>>> (--enable-debug=all --disable-production) and I do not get any errors or >>>>>> warnings from the HDF5 library. >>>>>> >>>>>> I ran the code through TotalView, and got the attached backtrace for the >>>>>> 20 processes that participate in the IO communicator. >>>>>> >>>>>> Does anyone have any idea on how to continue debugging this problem? >>>>>> >>>>>> I currently use HDF5 version 1.8.17. >>>>>> >>>>>> Best regards, >>>>>> Håkon Strandenes >>>>>> <Backtrace HDF5 err.png>_______________________________________________ >>>>>> Hdf-forum is for HDF software users discussion. >>>>>> [email protected] >>>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>>>>> Twitter: https://twitter.com/hdf5 >>>>> _______________________________________________ >>>>> Hdf-forum is for HDF software users discussion. >>>>> [email protected] >>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>>>> Twitter: https://twitter.com/hdf5 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
