Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
Mark Allen via users writes: > At least for the topic of why romio fails with HDF5, I believe this is the > fix we need (has to do with how romio processes the MPI datatypes in its > flatten routine). I made a different fix a long time ago in SMPI for that, > then somewhat more recently it was re-broke it and I had to re-fix it. So > the below takes a little more aggressive approach, not totally redesigning > the flatten function, but taking over how the array size counter is handled. > https://github.com/open-mpi/ompi/pull/3975 > > Mark Allen Thanks. (As it happens, the system we're struggling on is an IBM one.) In the meantime I've hacked in romio from mpich-4.3b1 without really understanding what I'm doing; I think it needs some tidying up on both the mpich and ompi sides. That passed make check in testpar, assuming the complaints from testpflush are the expected ones. (I've not had access to a filesystem with flock to run this previously.) Perhaps it's time to update romio anyway. It may only be relevant to lustre, but I guess that's what most people have.
Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
Hi Mark, Thanks so much for this - yes, applying that pull request against ompi 4.0.5 allows hdf5 1.10.7's parallel tests to pass on our Lustre filesystem. I'll certainly be applying it on our local clusters! Best wishes, Mark On Tue, 1 Dec 2020, Mark Allen via users wrote: At least for the topic of why romio fails with HDF5, I believe this is the fix we need (has to do with how romio processes the MPI datatypes in its flatten routine). I made a different fix a long time ago in SMPI for that, then somewhat more recently it was re-broke it and I had to re-fix it. So the below takes a little more aggressive approach, not totally redesigning the flatten function, but taking over how the array size counter is handled. https://github.com/open-mpi/ompi/pull/3975 Mark Allen
Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
Just a point to consider. OMPI does _not_ want to get in the mode of modifying imported software packages. That is a blackhole of effort we simply cannot afford. The correct thing to do would be to flag Rob Latham on that PR and ask that he upstream the fix into ROMIO so we can absorb it. We shouldn't be committing such things directly into OMPI itself. It's called "working with the community" as opposed to taking a point-solution approach :-) > On Dec 2, 2020, at 8:46 AM, Mark Dixon via users > wrote: > > Hi Mark, > > Thanks so much for this - yes, applying that pull request against ompi 4.0.5 > allows hdf5 1.10.7's parallel tests to pass on our Lustre filesystem. > > I'll certainly be applying it on our local clusters! > > Best wishes, > > Mark > > On Tue, 1 Dec 2020, Mark Allen via users wrote: > >> At least for the topic of why romio fails with HDF5, I believe this is the >> fix we need (has to do with how romio processes the MPI datatypes in its >> flatten routine). I made a different fix a long time ago in SMPI for that, >> then somewhat more recently it was re-broke it and I had to re-fix it. So >> the below takes a little more aggressive approach, not totally redesigning >> the flatten function, but taking over how the array size counter is handled. >> https://github.com/open-mpi/ompi/pull/3975 >> Mark Allen >>
[OMPI users] Parallel HDF5 low performance
Hi, I'm using an old (but required by the codes) version of hdf5 (1.8.12) in parallel mode in 2 fortran applications. It relies on MPI/IO. The storage is NFS mounted on the nodes of a small cluster. With OpenMPI 1.7 it runs fine but using modern OpenMPI 3.1 or 4.0.5 the I/Os are 10x to 100x slower. Are there fundamentals changes in MPI/IO for these new releases of OpenMPI and a solution to get back to the IO performances with this parallel HDF5 release ? Thanks for your advices Patrick