Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-07 Thread Dave Love via users
Ralph Castain via users  writes:

> Just a point to consider. OMPI does _not_ want to get in the mode of
> modifying imported software packages. That is a blackhole of effort we
> simply cannot afford.

It's already done that, even in flatten.c.  Otherwise updating to the
current version would be trivial.  I'll eventually make suggestions for
some changes in MPICH for standalone builds if I can verify that they
don't break things outside of OMPI.

Meanwhile we don't have a recent version that will even pass tests
recommended here, and we've long been asking about MPI-IO on lustre.  We
probably should move to some sort of MPICH for MPI-IO on probably the
most likely parallel filesystem as well as RMA on the most likely fabric.

> The correct thing to do would be to flag Rob Latham on that PR and ask
> that he upstream the fix into ROMIO so we can absorb it. We shouldn't
> be committing such things directly into OMPI itself.

It's already fixed differently in mpich, but the simple patch is useful
if there's nothing else broken.  I approve of sending fixes to MPICH,
but that will only do any good if OMPI's version gets updated from
there, which doesn't seem to happen.

> It's called "working with the community" as opposed to taking a
> point-solution approach :-)

The community has already done work to fix this properly.  It's a pity
that will be wasted.  This bit of the community is grateful for the
patch, which is reasonable to carry in packaging for now, unlike a whole
new romio.


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Ralph Castain via users
Just a point to consider. OMPI does _not_ want to get in the mode of modifying 
imported software packages. That is a blackhole of effort we simply cannot 
afford.

The correct thing to do would be to flag Rob Latham on that PR and ask that he 
upstream the fix into ROMIO so we can absorb it. We shouldn't be committing 
such things directly into OMPI itself.

It's called "working with the community" as opposed to taking a point-solution 
approach :-)


> On Dec 2, 2020, at 8:46 AM, Mark Dixon via users  
> wrote:
> 
> Hi Mark,
> 
> Thanks so much for this - yes, applying that pull request against ompi 4.0.5 
> allows hdf5 1.10.7's parallel tests to pass on our Lustre filesystem.
> 
> I'll certainly be applying it on our local clusters!
> 
> Best wishes,
> 
> Mark
> 
> On Tue, 1 Dec 2020, Mark Allen via users wrote:
> 
>> At least for the topic of why romio fails with HDF5, I believe this is the 
>> fix we need (has to do with how romio processes the MPI datatypes in its 
>> flatten routine).  I made a different fix a long time ago in SMPI for that, 
>> then somewhat more recently it was re-broke it and I had to re-fix it.  So 
>> the below takes a little more aggressive approach, not totally redesigning 
>> the flatten function, but taking over how the array size counter is handled. 
>> https://github.com/open-mpi/ompi/pull/3975
>> Mark Allen
>>  




Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Mark Dixon via users

Hi Mark,

Thanks so much for this - yes, applying that pull request against ompi 
4.0.5 allows hdf5 1.10.7's parallel tests to pass on our Lustre 
filesystem.


I'll certainly be applying it on our local clusters!

Best wishes,

Mark

On Tue, 1 Dec 2020, Mark Allen via users wrote:

At least for the topic of why romio fails with HDF5, I believe this is 
the fix we need (has to do with how romio processes the MPI datatypes in 
its flatten routine).  I made a different fix a long time ago in SMPI 
for that, then somewhat more recently it was re-broke it and I had to 
re-fix it.  So the below takes a little more aggressive approach, not 
totally redesigning the flatten function, but taking over how the array 
size counter is handled. https://github.com/open-mpi/ompi/pull/3975


Mark Allen
 




Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-02 Thread Dave Love via users
Mark Allen via users  writes:

> At least for the topic of why romio fails with HDF5, I believe this is the
> fix we need (has to do with how romio processes the MPI datatypes in its
> flatten routine).  I made a different fix a long time ago in SMPI for that,
> then somewhat more recently it was re-broke it and I had to re-fix it.  So
> the below takes a little more aggressive approach, not totally redesigning
> the flatten function, but taking over how the array size counter is handled.
> https://github.com/open-mpi/ompi/pull/3975
>
> Mark Allen

Thanks.  (As it happens, the system we're struggling on is an IBM one.)

In the meantime I've hacked in romio from mpich-4.3b1 without really
understanding what I'm doing; I think it needs some tidying up on both
the mpich and ompi sides.  That passed make check in testpar, assuming
the complaints from testpflush are the expected ones.  (I've not had
access to a filesystem with flock to run this previously.)

Perhaps it's time to update romio anyway.  It may only be relevant to
lustre, but I guess that's what most people have.


[OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-12-01 Thread Mark Allen via users
At least for the topic of why romio fails with HDF5, I believe this is the fix we need (has to do with how romio processes the MPI datatypes in its flatten routine).  I made a different fix a long time ago in SMPI for that, then somewhat more recently it was re-broke it and I had to re-fix it.  So the below takes a little more aggressive approach, not totally redesigning the flatten function, but taking over how the array size counter is handled.https://github.com/open-mpi/ompi/pull/3975Mark Allen
 



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-30 Thread Mark Dixon via users

On Fri, 27 Nov 2020, Dave Love wrote:
...
It's less dramatic in the case I ran, but there's clearly something 
badly wrong which needs profiling.  It's probably useful to know how 
many ranks that's with, and whether it's the default striping.  (I 
assume with default ompio fs parameters.)


Hi Dave,

It was run the way hdf5's "make check" runs it - that's 6 ranks. I didn't 
do anything interesting with striping so, unless t_bigio changed it, it'd 
have a width of 1.


...

I can have a look with the current or older romio, unless someone else
is going to; we should sort this.


If you were willing, that would be brilliant, thanks :)


My concern is that openmpi 3.x is near, or at, end of life.


'Twas ever thus, but if it works?


Evidently it wouldn't fit the definition of "works" for some users, 
otherwise there wouldn't have been a version 4!


I just didn't want Lustre MPI-IO support to be forgotten about, 
considering the 4.x series is 2 years old now.


All the best,

Mark


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-30 Thread Dave Love via users
As a check of mpiP, I ran HDF5 testpar/t_bigio under it.  This was on
one node with four ranks (interactively) on lustre with its default of
one 1MB stripe, ompi-4.0.5 + ucx-1.9, hdf5-1.10.7, MCA defaults.

I don't know how useful it is, but here's the summary:

romio:

  @--- Aggregate Time (top twenty, descending, milliseconds) 
  ---
  Call Site   TimeApp%MPI%  CountCOV
  File_write_at_all  26   2.58e+04   47.50   50.24 16   0.00
  File_read_at_all   14   2.42e+04   44.47   47.03 16   0.00
  File_set_view  295150.951.00 16   0.09
  File_set_view   33820.700.74 16   0.00

ompio:

  @--- Aggregate Time (top twenty, descending, milliseconds) 
  ---
  Call Site   TimeApp%MPI%  CountCOV
  File_read_at_all   14   3.32e+06   82.83   82.90 16   0.00
  File_write_at_all  26   6.72e+05   16.77   16.78 16   0.02
  File_set_view  11   1.14e+040.280.28 16   0.91
  File_set_view  293400.010.01 16   0.35

with call sites

   ID Lev File/AddressLine Parent_FunctMPI_Call
  
   11   0 H5FDmpio.c  1651 H5FD_mpio_write File_set_view
   14   0 H5FDmpio.c  1436 H5FD_mpio_read  
File_read_at_all
   26   0 H5FDmpio.c  1636 H5FD_mpio_write 
File_write_at_all

I also looked at the romio hang in testphdf5.  In the absence of a
parallel debugger, strace and kill show an endless loop of read(...,"",0)
under this:

  [login2:115045] [ 2] 
.../mca_io_romio321.so(ADIOI_LUSTRE_ReadContig+0xa8)[0x20003d1cab88]
  [login2:115045] [ 3] 
.../mca_io_romio321.so(ADIOI_GEN_ReadStrided+0x528)[0x20003d1e4f08]
  [login2:115045] [ 4] 
.../mca_io_romio321.so(ADIOI_GEN_ReadStridedColl+0x1084)[0x20003d1e4514]
  [login2:115045] [ 5] 
.../mca_io_romio321.so(MPIOI_File_read_all+0x124)[0x20003d1c37c4]
  [login2:115045] [ 6] 
.../mca_io_romio321.so(mca_io_romio_dist_MPI_File_read_at_all+0x34)[0x20003d1c41d4]
  [login2:115045] [ 7] 
.../mca_io_romio321.so(mca_io_romio321_file_read_at_all+0x3c)[0x20003d1bdabc]
  [login2:115045] [ 8] 
.../libmpi.so.40(PMPI_File_read_at_all+0x13c)[0x2078de4c]


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-27 Thread Dave Love via users
Mark Dixon via users  writes:

> But remember that IMB-IO doesn't cover everything.

I don't know what useful operations it omits, but it was the obvious
thing to run, that should show up pathology, with simple things first.
It does at least run, which was the first concern.

> For example, hdf5's
> t_bigio parallel test appears to be a pathological case and OMPIO is 2
> orders of magnitude slower on a Lustre filesystem:
>
> - OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
> - OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

It's less dramatic in the case I ran, but there's clearly something
badly wrong which needs profiling.  It's probably useful to know how
many ranks that's with, and whether it's the default striping.  (I
assume with default ompio fs parameters.)

> End users seem to have the choice of:
>
> - use openmpi 4.x and have some things broken (romio)
> - use openmpi 4.x and have some things slow (ompio)
> - use openmpi 3.x and everything works

I can have a look with the current or older romio, unless someone else
is going to; we should sort this.

> My concern is that openmpi 3.x is near, or at, end of life.

'Twas ever thus, but if it works?

[Posted in case it's useful, rather than discussing more locally.]


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users

Hi Edgar,

Thank you so much for your reply. Having run a number of Lustre systems 
over the years, I fully sympathise with your characterisation of Lustre as 
being very unforgiving!


Best wishes,

Mark

On Thu, 26 Nov 2020, Gabriel, Edgar wrote:

I will have a look at the t_bigio tests on Lustre with ompio.  We had 
from collaborators some reports about the performance problems similar 
to the one that you mentioned here (which was the reason we were 
hesitant to make ompio the default on Lustre), but part of the problem 
is that we were not able to reproduce it reliably on the systems that we 
had access to, which we makes debugging and fixing the issue very 
difficult. Lustre is a very unforgiving file system, if you get 
something wrong with the settings, the performance is not just a bit 
off, but often orders of magnitude (as in your measurements).


Thanks!
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Thursday, November 26, 2020 9:38 AM
To: Dave Love via users 
Cc: Mark Dixon ; Dave Love 

Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

On Wed, 25 Nov 2020, Dave Love via users wrote:


The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail,
and ran with suboptimal striping).


I take that back.  I can't reproduce a significant difference for
total IMB-IO runtime, with both run in parallel on 16 ranks, using
either the system default of a single 1MB stripe or using eight
stripes.  I haven't teased out figures for different operations yet.
That must have been done elsewhere, but I've never seen figures.


But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio 
parallel test appears to be a pathological case and OMPIO is 2 orders of 
magnitude slower on a Lustre filesystem:

- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 
./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s




Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Gabriel, Edgar via users
I will have a look at the t_bigio tests on Lustre with ompio.  We had from 
collaborators some reports about the performance problems similar to the one 
that you mentioned here (which was the reason we were hesitant to make ompio 
the default on Lustre), but part of the problem is that we were not able to 
reproduce it reliably on the systems that we had access to, which we makes 
debugging and fixing the issue very difficult. Lustre is a very unforgiving 
file system, if you get something wrong with the settings, the performance is 
not just a bit off,  but often orders of magnitude (as in your measurements).

Thanks!
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Thursday, November 26, 2020 9:38 AM
To: Dave Love via users 
Cc: Mark Dixon ; Dave Love 

Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

On Wed, 25 Nov 2020, Dave Love via users wrote:

>> The perf test says romio performs a bit better.  Also -- from overall 
>> time -- it's faster on IMB-IO (which I haven't looked at in detail, 
>> and ran with suboptimal striping).
>
> I take that back.  I can't reproduce a significant difference for 
> total IMB-IO runtime, with both run in parallel on 16 ranks, using 
> either the system default of a single 1MB stripe or using eight 
> stripes.  I haven't teased out figures for different operations yet.  
> That must have been done elsewhere, but I've never seen figures.

But remember that IMB-IO doesn't cover everything. For example, hdf5's t_bigio 
parallel test appears to be a pathological case and OMPIO is 2 orders of 
magnitude slower on a Lustre filesystem:

- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio [login testpar]$ time mpirun -np 6 
./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection ***Express test mode on.  Several 
tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-26 Thread Mark Dixon via users

On Wed, 25 Nov 2020, Dave Love via users wrote:


The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail, and
ran with suboptimal striping).


I take that back.  I can't reproduce a significant difference for total
IMB-IO runtime, with both run in parallel on 16 ranks, using either the
system default of a single 1MB stripe or using eight stripes.  I haven't
teased out figures for different operations yet.  That must have been
done elsewhere, but I've never seen figures.


But remember that IMB-IO doesn't cover everything. For example, hdf5's 
t_bigio parallel test appears to be a pathological case and OMPIO is 2 
orders of magnitude slower on a Lustre filesystem:


- OMPI's default MPI-IO implementation on Lustre (ROMIO): 21 seconds
- OMPI's alternative MPI-IO implementation on Lustre (OMPIO): 2554 seconds

End users seem to have the choice of:

- use openmpi 4.x and have some things broken (romio)
- use openmpi 4.x and have some things slow (ompio)
- use openmpi 3.x and everything works

My concern is that openmpi 3.x is near, or at, end of life.

Mark


t_bigio runs on centos 7, gcc 4.8.5, ppc64le, openmpi 4.0.5, hdf5 1.10.7, 
Lustre 2.12.5:

[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection
***Express test mode on.  Several tests are skipped

real0m21.141s
user2m0.318s
sys 0m3.289s


[login testpar]$ export OMPI_MCA_io=ompio
[login testpar]$ time mpirun -np 6 ./t_bigio

Testing Dataset1 write by ROW

Testing Dataset2 write by COL

Testing Dataset3 write select ALL proc 0, NONE others

Testing Dataset4 write point selection

Read Testing Dataset1 by COL

Read Testing Dataset2 by ROW

Read Testing Dataset3 read select ALL proc 0, NONE others

Read Testing Dataset4 with Point selection
***Express test mode on.  Several tests are skipped

real42m34.103s
user213m22.925s
sys 8m6.742s



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-25 Thread Dave Love via users
I wrote: 

> The perf test says romio performs a bit better.  Also -- from overall
> time -- it's faster on IMB-IO (which I haven't looked at in detail, and
> ran with suboptimal striping).

I take that back.  I can't reproduce a significant difference for total
IMB-IO runtime, with both run in parallel on 16 ranks, using either the
system default of a single 1MB stripe or using eight stripes.  I haven't
teased out figures for different operations yet.  That must have been
done elsewhere, but I've never seen figures.


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-23 Thread Howard Pritchard via users
HI All,

I opened a new issue to track the coll_perf failure in case its not related
to the HDF5 problem reported earlier.

https://github.com/open-mpi/ompi/issues/8246

Howard


Am Mo., 23. Nov. 2020 um 12:14 Uhr schrieb Dave Love via users <
users@lists.open-mpi.org>:

> Mark Dixon via users  writes:
>
> > Surely I cannot be the only one who cares about using a recent openmpi
> > with hdf5 on lustre?
>
> I generally have similar concerns.  I dug out the romio tests, assuming
> something more basic is useful.  I ran them with ompi 4.0.5+ucx on
> Mark's lustre system (similar to a few nodes of Summit, apart from the
> filesystem, but with quad-rail IB which doesn't give the bandwidth I
> expected).
>
> The perf test says romio performs a bit better.  Also -- from overall
> time -- it's faster on IMB-IO (which I haven't looked at in detail, and
> ran with suboptimal striping).
>
>   Test: perf
>   romio321
>   Access size per process = 4194304 bytes, ntimes = 5
>   Write bandwidth without file sync = 19317.372354 Mbytes/sec
>   Read bandwidth without prior file sync = 35033.325451 Mbytes/sec
>   Write bandwidth including file sync = 1081.096713 Mbytes/sec
>   Read bandwidth after file sync = 47135.349155 Mbytes/sec
>   ompio
>   Access size per process = 4194304 bytes, ntimes = 5
>   Write bandwidth without file sync = 18442.698536 Mbytes/sec
>   Read bandwidth without prior file sync = 31958.198676 Mbytes/sec
>   Write bandwidth including file sync = 1081.058583 Mbytes/sec
>   Read bandwidth after file sync = 31506.854710 Mbytes/sec
>
> However, romio coll_perf fails as follows, and ompio runs.  Isn't there
> mpi-io regression testing?
>
>   [gpu025:89063:0:89063] Caught signal 11 (Segmentation fault: address not
> mapped to object at address 0x1fffbc10)
>    backtrace (tid:  89063) 
>0 0x0005453c ucs_debug_print_backtrace()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucs/debug/debug.c:656
>1 0x00041b04 ucp_rndv_pack_data()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1335
>2 0x0001c814 uct_self_ep_am_bcopy()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:278
>3 0x0003f7ac uct_ep_am_bcopy()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2561
>4 0x0003f7ac ucp_do_am_bcopy_multi()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.inl:79
>5 0x0003f7ac ucp_rndv_progress_am_bcopy()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1352
>6 0x00041cb8 ucp_request_try_send()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
>7 0x00041cb8 ucp_request_send()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
>8 0x00041cb8 ucp_rndv_rtr_handler()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1754
>9 0x0001c984 uct_iface_invoke_am()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/base/uct_iface.h:635
>   10 0x0001c984 uct_self_iface_sendrecv_am()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:149
>   11 0x0001c984 uct_self_ep_am_short()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:262
>   12 0x0002ee30 uct_ep_am_short()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2549
>   13 0x0002ee30 ucp_do_am_single()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.c:68
>   14 0x00042908 ucp_proto_progress_rndv_rtr()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:172
>   15 0x0003f4c4 ucp_request_try_send()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
>   16 0x0003f4c4 ucp_request_send()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
>   17 0x0003f4c4 ucp_rndv_req_send_rtr()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:423
>   18 0x00045214 ucp_rndv_matched()
> /tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1262
>   19 0x00046158 ucp_rndv_process_rts()
> /tmp/**

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-23 Thread Dave Love via users
Mark Dixon via users  writes:

> Surely I cannot be the only one who cares about using a recent openmpi
> with hdf5 on lustre?

I generally have similar concerns.  I dug out the romio tests, assuming
something more basic is useful.  I ran them with ompi 4.0.5+ucx on
Mark's lustre system (similar to a few nodes of Summit, apart from the
filesystem, but with quad-rail IB which doesn't give the bandwidth I
expected).

The perf test says romio performs a bit better.  Also -- from overall
time -- it's faster on IMB-IO (which I haven't looked at in detail, and
ran with suboptimal striping).

  Test: perf
  romio321
  Access size per process = 4194304 bytes, ntimes = 5
  Write bandwidth without file sync = 19317.372354 Mbytes/sec
  Read bandwidth without prior file sync = 35033.325451 Mbytes/sec
  Write bandwidth including file sync = 1081.096713 Mbytes/sec
  Read bandwidth after file sync = 47135.349155 Mbytes/sec
  ompio
  Access size per process = 4194304 bytes, ntimes = 5
  Write bandwidth without file sync = 18442.698536 Mbytes/sec
  Read bandwidth without prior file sync = 31958.198676 Mbytes/sec
  Write bandwidth including file sync = 1081.058583 Mbytes/sec
  Read bandwidth after file sync = 31506.854710 Mbytes/sec

However, romio coll_perf fails as follows, and ompio runs.  Isn't there
mpi-io regression testing?

  [gpu025:89063:0:89063] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x1fffbc10)
   backtrace (tid:  89063) 
   0 0x0005453c ucs_debug_print_backtrace()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucs/debug/debug.c:656
   1 0x00041b04 ucp_rndv_pack_data()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1335
   2 0x0001c814 uct_self_ep_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:278
   3 0x0003f7ac uct_ep_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2561
   4 0x0003f7ac ucp_do_am_bcopy_multi()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.inl:79
   5 0x0003f7ac ucp_rndv_progress_am_bcopy()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1352
   6 0x00041cb8 ucp_request_try_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
   7 0x00041cb8 ucp_request_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
   8 0x00041cb8 ucp_rndv_rtr_handler()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1754
   9 0x0001c984 uct_iface_invoke_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/base/uct_iface.h:635
  10 0x0001c984 uct_self_iface_sendrecv_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:149
  11 0x0001c984 uct_self_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/sm/self/self.c:262
  12 0x0002ee30 uct_ep_am_short()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/uct/api/uct.h:2549
  13 0x0002ee30 ucp_do_am_single()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/proto/proto_am.c:68
  14 0x00042908 ucp_proto_progress_rndv_rtr()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:172
  15 0x0003f4c4 ucp_request_try_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:223
  16 0x0003f4c4 ucp_request_send()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/core/ucp_request.inl:258
  17 0x0003f4c4 ucp_rndv_req_send_rtr()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:423
  18 0x00045214 ucp_rndv_matched()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1262
  19 0x00046158 ucp_rndv_process_rts()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1280
  20 0x00046268 ucp_rndv_rts_handler()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcrae2g33b6el/spack-src/src/ucp/tag/rndv.c:1304
  21 0x0001c984 uct_iface_invoke_am()  
/tmp/***/spack-stage/spack-stage-ucx-1.9.0-wqtizxmjw66cklwpuq3zcr

Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-17 Thread Mark Dixon via users

Hi Edgar,

Pity, that would have been nice! But thanks for looking.

Checking through the ompi github issues, I now realise I logged exactly 
the same issue over a year ago (completely forgot - I've moved jobs since 
then), including a script to reproduce the issue on a Lustre system. 
Unfortunately there's been no movement:


https://github.com/open-mpi/ompi/issues/6871

If it helps anyone, I can confirm that hdf5 parallel tests pass with 
openmpi 3.1.6, but not in 4.0.5.


Surely I cannot be the only one who cares about using a recent openmpi 
with hdf5 on lustre?


Mark

On Mon, 16 Nov 2020, Gabriel, Edgar wrote:

hm, I think this sounds like a different issue, somebody who is more 
invested in the ROMIO Open MPI work should probably have a look.


Regarding compiling Open MPI with Lustre support for ROMIO, I cannot 
test it right now for various reasons, but if I recall correctly the 
trick was to provide the --with-lustre option twice, once inside of the 
"--with-io-romio-flags=" (along with the option that you provided), and 
once outside (for ompio).


Thanks
Edgar




Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Gabriel, Edgar via users
hm, I think this sounds like a different issue, somebody who is more invested 
in the ROMIO Open MPI work should probably have a look.

Regarding compiling Open MPI with Lustre support for ROMIO, I cannot test it 
right now for various reasons, but if I recall correctly the trick was to 
provide the --with-lustre option twice, once inside of the 
"--with-io-romio-flags=" (along with the option that you provided), and once 
outside (for ompio).

Thanks
Edgar

-Original Message-
From: Mark Dixon  
Sent: Monday, November 16, 2020 8:19 AM
To: Gabriel, Edgar via users 
Cc: Gabriel, Edgar 
Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi Edgar,

Thanks for this - good to know that ompio is an option, despite the reference 
to potential performance issues.

I'm using openmpi 4.0.5 with ucx 1.9.0 and see the hdf5 1.10.7 test "testphdf5" 
timeout (with the timeout set to an hour) using romio. Is it a known issue 
there, please?

When it times out, the last few lines to be printed are these:

Testing  -- multi-chunk collective chunk io (cchunk3) Testing  -- multi-chunk 
collective chunk io (cchunk3) Testing  -- multi-chunk collective chunk io 
(cchunk3) Testing  -- multi-chunk collective chunk io (cchunk3) Testing  -- 
multi-chunk collective chunk io (cchunk3) Testing  -- multi-chunk collective 
chunk io (cchunk3)

The other thing I note is that openmpi doesn't configure romio's lustre driver, 
even when given "--with-lustre". Regardless, I see the same result whether or 
not I add "--with-io-romio-flags=--with-file-system=lustre+ufs"

Cheers,

Mark

On Mon, 16 Nov 2020, Gabriel, Edgar via users wrote:

> this is in theory still correct, the default MPI I/O library used by 
> Open MPI on Lustre file systems is ROMIO in all release versions. That 
> being said, ompio does have support for Lustre as well starting from 
> the
> 2.1 series, so you can use that as well. The main reason that we did 
> not switch to ompio for Lustre as the default MPI I/O library is a 
> performance issue that can arise under certain circumstances.
>
> Which version of Open MPI are you using? There was a bug fix in the 
> Open MPI to ROMIO integration layer sometime in the 4.0 series that 
> fixed a datatype problem, which caused some problems in the HDF5 
> tests. You might be hitting that problem.
>
> Thanks
> Edgar
>
> -Original Message-
> From: users  On Behalf Of Mark Dixon 
> via users
> Sent: Monday, November 16, 2020 4:32 AM
> To: users@lists.open-mpi.org
> Cc: Mark Dixon 
> Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
>
> Hi all,
>
> I'm confused about how openmpi supports mpi-io on Lustre these days, 
> and am hoping that someone can help.
>
> Back in the openmpi 2.0.0 release notes, it said that OMPIO is the 
> default MPI-IO implementation on everything apart from Lustre, where 
> ROMIO is used. Those release notes are pretty old, but it still 
> appears to be true.
>
> However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I 
> tell openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to 
> print warning messages (UCX_LOG_LEVEL=ERROR).
>
> Can I just check: are we still supposed to be using ROMIO?
>
> Thanks,
>
> Mark
>


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users

Hi Edgar,

Thanks for this - good to know that ompio is an option, despite the 
reference to potential performance issues.


I'm using openmpi 4.0.5 with ucx 1.9.0 and see the hdf5 1.10.7 test 
"testphdf5" timeout (with the timeout set to an hour) using romio. Is it a 
known issue there, please?


When it times out, the last few lines to be printed are these:

Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)

The other thing I note is that openmpi doesn't configure romio's lustre 
driver, even when given "--with-lustre". Regardless, I see the same result 
whether or not I add "--with-io-romio-flags=--with-file-system=lustre+ufs"


Cheers,

Mark

On Mon, 16 Nov 2020, Gabriel, Edgar via users wrote:

this is in theory still correct, the default MPI I/O library used by 
Open MPI on Lustre file systems is ROMIO in all release versions. That 
being said, ompio does have support for Lustre as well starting from the 
2.1 series, so you can use that as well. The main reason that we did not 
switch to ompio for Lustre as the default MPI I/O library is a 
performance issue that can arise under certain circumstances.


Which version of Open MPI are you using? There was a bug fix in the Open 
MPI to ROMIO integration layer sometime in the 4.0 series that fixed a 
datatype problem, which caused some problems in the HDF5 tests. You 
might be hitting that problem.


Thanks
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Monday, November 16, 2020 4:32 AM
To: users@lists.open-mpi.org
Cc: Mark Dixon 
Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and 
am hoping that someone can help.


Back in the openmpi 2.0.0 release notes, it said that OMPIO is the 
default MPI-IO implementation on everything apart from Lustre, where 
ROMIO is used. Those release notes are pretty old, but it still appears 
to be true.


However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print 
warning messages (UCX_LOG_LEVEL=ERROR).


Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Gabriel, Edgar via users
this is in theory still correct, the default MPI I/O library used by Open MPI 
on Lustre file systems is ROMIO in all release versions. That being said, ompio 
does have support for Lustre as well starting from the 2.1 series, so you can 
use that as well. The main reason that we did not switch to ompio for Lustre as 
the default MPI I/O library is a performance issue that can arise under certain 
circumstances.

Which version of Open MPI are you using? There was a bug fix in the Open MPI to 
ROMIO integration layer sometime in the 4.0 series that fixed a datatype 
problem, which caused some problems in the HDF5 tests. You might be hitting 
that problem.

Thanks
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Monday, November 16, 2020 4:32 AM
To: users@lists.open-mpi.org
Cc: Mark Dixon 
Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and am 
hoping that someone can help.

Back in the openmpi 2.0.0 release notes, it said that OMPIO is the default 
MPI-IO implementation on everything apart from Lustre, where ROMIO is used. 
Those release notes are pretty old, but it still appears to be true.

However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print warning 
messages (UCX_LOG_LEVEL=ERROR).

Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark


[OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and 
am hoping that someone can help.


Back in the openmpi 2.0.0 release notes, it said that OMPIO is the default 
MPI-IO implementation on everything apart from Lustre, where ROMIO is 
used. Those release notes are pretty old, but it still appears to be true.


However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print warning 
messages (UCX_LOG_LEVEL=ERROR).


Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark