Some things to watch out for. . . Are you by chance accidentally leaving one or more objects in the file 'open' (e.g. did you forget some H5Xclose() call somewhere). I cannot atest to that causing actual hangs in H5Fclose but I know HDF has some logic to detect possible infinite loop in sym-link/group structure for which it sometimes actually outputs a message along the lines of "…infinite loop detected while closing file 'foo.h5' . . .". i sometimes wind up using H5Fget_obj_count just prior to H5Fclose to try to debug this when it (occasionally) has happend for me.
You say you are running in parallel. Is the file on an actual parallel filesystem? Are you by chance mucking with the filesystem's metadata via calls to stat or mkdir or chdir at any time before or after your create or close the HDF5 file? If so, are you ensuring parallel sync. via MPI_barrier before proceeding after such calls? The core counts you mention are small so you might be able to raise(SIGSTOP) just before H5Fclose and then gdb (or totalview) to several of the processes to see whats happening. Likewise, you mght be able to run valgrind on each process (sending output to separate files) to help debug too. Sorry I don't have any other ideas. Good luck. Mark From: Wolf Dapp <[email protected]<mailto:[email protected]>> Reply-To: HDF Users Discussion List <[email protected]<mailto:[email protected]>> Date: Tuesday, April 7, 2015 9:30 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes Dear hdf-forum members, I have a problem I am hoping someone can help me with. I have a program that outputs a 2D-array (contiguous, indexed linearly) using parallel HDF5. When I choose a number of processors that is not a power of 2 (1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14, and OpenMPI 1.7.2, on top of GCC 4.8 with Linux. Can someone help me pinpoint my mistake? I have searched the forum, and the first hit [searching for "h5fclose hangs"] was a user mistake that I didn't make (to the best of my knowledge). The second didn't go on beyond the initial problem description, and didn't offer a solution. Attached is a (maybe insufficiently bare-boned, apologies) demonstrator program. Strangely, the hang only happens if nx >= 32. The code is adapted from an HDF5 example program. The demonstrator is compiled with h5pcc test.hangs.cpp -DVERBOSE -lstdc++ ( on my system, for some strange reason, MPI has been compiled with the deprecated C++ bindings. I need to include -lmpi_cxx also, but that shouldn't be necessary for anyone else. I hope that's not the reason for the hang-ups. ) Thanks in advance for your help! Wolf Dapp --
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
