Mark,

The same code hangs on the customer machine, but works fine on our clusters. 
Would that be possible if some subset aren't participating in the I/O?

Thanks,
Dave

On Oct 26, 2010, at 5:14 PM, Mark Howison wrote:

> Hi Dave,
> 
> One common hang with collective-mode parallel I/O in HDF5 is when only
> a subset of processes are participating in the I/O, but the other
> processes haven't made an empty selection (to say that they are not
> participating) using H5Sselect_none(). Also, have you tried
> experimenting with collective vs. independent mode?
> 
> Mark
> 
> On Tue, Oct 26, 2010 at 6:52 PM, Dave Wade-Stein <[email protected]> wrote:
>> We use hdf5 for parallel I/O in VORPAL, our laser plasma simulation code. 
>> For the most part, it works fine, but on certain machines (e.g., early Cray 
>> and BG/P) and certain types of filesystems, we've noticed that parallel I/O 
>> hangs, so we instituted a -id (individual dump) option which causes each MPI 
>> rank to dump its own hdf5 file, and once the simulation is complete, we 
>> merge the individual dump files.
>> 
>> We have a customer for whom parallel I/O is hanging, and they are using -id 
>> as described above. We're trying to pinpoint why parallel I/O is not working 
>> on their system, which is CentOS 5.5 cluster.
>> 
>> In the past we ourselves have had problems with parallel I/O failing on ext3 
>> filesystems, so we reformatted as XFS and the problem went away. Our 
>> customer did this, but the problem still persists.
>> 
>> Anyone have any words of wisdom as to what other things could cause parallel 
>> I/O to hang?
>> 
>> Thanks for any help!
>> Dave

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to