Re: [Hdf-forum] H5DRead crashes on Bluegene/Q with "out of memory"

Rob Latham Wed, 16 Sep 2015 12:29:01 -0700


On 09/01/2015 02:01 PM, Wolf Dapp wrote:

Am 01.09.2015 um 17:43 schrieb Scot Breitenfeld:

I was also getting the same error with MOAB from ANL when we were
benchmarking small mesh reads with large number of processors. When I
ran on 16384 processes the job would terminate with:

Out of memory in file
/bgsys/source/srcV1R2M1.17463/comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bg/ad_bg_rdcoll.c,
line 1073

A semi-discussion about the problem can be found here:

http://lists.mpich.org/pipermail/devel/2013-May/000154.html

We did not have time in the project to look into the problem any further.

Scot


Thanks for pointing out this discussion, Scot. It seems that not only
you did not have time to investigate the problem further, but neither
IBM nor MPICH did :)

I guess this indicates that it's not an HDF5 problem but an MPICH
problem, at heart, and that there's some memory allocations that scale
with the number of ranks.

Though it seems your team hit the "invisible barrier" much later than we
did.


Hello!  I'm pleased to see another Blue Gene user.

MPI collective I/O works at Blue Gene scale -- most of the time. Theexception appears to be when the distribution of data among processesis lumpy; e.g. everyone reads the exact same data, or some processeshave more to write than others. In those cases, some internal memoryallocations end up exhausting Blue Gene's memory.

You can limit the size of the intermediate buffer by setting the"cb_buffer_size" hint. Doing this splits up the read or write into morerounds and so indirectly limits the total memory used. It's only aband-aide, though.

The read-and-broadcast approach is the best for your workload, and theone I end up suggesting any time this comes up.

Why don't we do this inside the MPI-IO library? Glad you asked! itturns out for a lot of reasons (file views, etypes, ftypes, and the factthat different datatypes may have identical type maps and yet there's nogood way to compare types in that way) answering "did you all want toread the same data" is actually kind of challenging in the MPI-IO library.

It's easier to detect identical reads in HDF5, because one need onlylook at the (hyperslab) selection: To determine "you are all asking forthe entire dataset" or "you are all asking for one row of this 3dvariable" requires only comparing two N-d arrays. This comparison islikely expensive at scale, though, so "easier" does not necessarily mean"good idea" -- I don't think we'd want this turned on for every access.

so that leaves the application, which indeed knows everyone is readingthe same data. It sort of sounds like passing the buck, and perhaps itis, but not for lack of effort from the other layers of the software stack.


==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] H5DRead crashes on Bluegene/Q with "out of memory"

Reply via email to