Hi Håkon,

On 10/14/2012 3:54 AM, Håkon Strandenes wrote:
Thanks, I had a suspicion about that. Some more problems have appeared (H5Dcreate2 freezes/hangs after a few writes), but I will try to debug some more before I ask you...

Anyways; I have a few questions about how HDF5 does the writes. When I now do independent I/O (each process prints to it's own dataset), will the actual data transfer happen in parallel, assuming the underlaying file system is parallel (Lustre)?

Yes independent transfer just translates to independent MPI-I/O read/write operations. So if your file system supports parallel file access, The HDF5 operations would occur in parallel. There is ofcourse the issue with Lustre, that data access on the same OST are serialized, but if your datasets are large enough, it would not be a huge issue.

The reason why I ask is of course that I have a large parallel simulation (with thousands of processes), and if each rank should wait for the lower ranks to finish (i.e. rank 4 must wait for rank 3 to finish, rank 3 must wait for rank 2 etc.), the I/O operations could take a tremendous amount of time.

Since the data is domain decomposed, I also thought it would be easiest to write each domain to a dataset, instead of trying to "stitch" together the domains before writes (witch would require quite a bit of communication and CPU cycles).

Yes this is one use case that we are considering supporting better in-terms of non-collective metadata access (so you don't have to call H5Dcreate n times). Another ongoing (but separate) work includes what I mentioned earlier, where we have H5Dread/write_multi, where you can access several datasets in one call collectively or independently.



On 14. okt. 2012 03:31, Mohamad Chaarawi wrote:
Yes Mark is correct. You program is erroneous.
The current interface for reading and writing to datasets (collectively)
requires all processes to call the operation for each read/write
operation. You can correct your program by having each processes
participate with a NULL selection in the read/write operation, except
for the dataset that belongs to that process, or just use independent I/O.

We are working on a new interface that would allow collective access to
multiple datasets simultaneously, so stay tuned :-)


On 10/13/2012 10:44 AM, Mark Miller wrote:
I think the problem may be that you are trying to execute a collective
write to separate datasets. That would explain why collective hangs and
independent succeeds.

I am a bit rusty on HDF5's parallel I/O semantics but AFAIK, a
collective write can only be done to the same dataset. That does NOT
mean each processor has to have an identical dsetID (e.g.
memcmp(&proc_1_dsetID, &proc_2_dsetID, sizeof(hid_t)) may be nonzero)
but it does mean the dataset object to which each processor's dsetID
references in the file has to be the same. In other words the name (or
path) of the dataset used in the create/open call needs to have been the

To issue writes to different datasets simultaneously in parallel, I
think you're only option is independent.

I wonder if your aiming to do collective to different datasets because
you expect that collective will be more easily 'coordinated' by the
underlying filsystem and therefore has a higher chance at better
performance than independent. If so, I don't know if that very often
turns out to be true/possible in practice.

I hope others with a little more parallel I/O experience might chime
in ;)


On Sat, 2012-10-13 at 10:48 +0200, Håkon Strandenes wrote:

I have (yet) another problem with the HDF5 library. I am trying to write some data in parallel to a file, where each process writes it's data to
it's own dataset. The datasets are first created (as collective
operations), and then H5Dwrite hangs when the data are to be written. No error messages are printed, the processes just hangs. I have used GDB on the hanging processes (all processes), and confirmed that it is actually
H5Dwrite that hangs.

The strange thing is that this does not always happen, sometimes it
works fine. To make it even stranger, it seems that the probability of
failure increases with increased problem size and number of processes
(or is that really strange?). This writes are in a time-loop, and
sometimes a few steps finishes before one write hangs.

I have also found out that if I set the transfer mode to
H5FD_MPIO_INDEPENDENT it seems that everything is working fine.

I have tried this on two computers, one workstation and one cluster. The
workstation uses OpenMPI with HDF5 1.8.4 and the cluster uses SGI's
MPT-MPI with HDF5 1.8.7. Based on the completely different MPI packages and systems, I think MPI and other system issues can be ruled out. The
resulting sources of error is then my code (probably) and HDF5 (not so
sure about that).

I have attached an example code that shows how I am doing the
HDF5-stuff. Unfortunately it is not runnable, but at least you can see
how I create and write to the dataset.

Thanks in advance for all help.

Best regards,
Håkon Strandenes
Hdf-forum is for HDF software users discussion.

Hdf-forum is for HDF software users discussion.

Hdf-forum is for HDF software users discussion.

Hdf-forum is for HDF software users discussion.

Reply via email to