[OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and 
am hoping that someone can help.


Back in the openmpi 2.0.0 release notes, it said that OMPIO is the default 
MPI-IO implementation on everything apart from Lustre, where ROMIO is 
used. Those release notes are pretty old, but it still appears to be true.


However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print warning 
messages (UCX_LOG_LEVEL=ERROR).


Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Gabriel, Edgar via users
this is in theory still correct, the default MPI I/O library used by Open MPI 
on Lustre file systems is ROMIO in all release versions. That being said, ompio 
does have support for Lustre as well starting from the 2.1 series, so you can 
use that as well. The main reason that we did not switch to ompio for Lustre as 
the default MPI I/O library is a performance issue that can arise under certain 
circumstances.

Which version of Open MPI are you using? There was a bug fix in the Open MPI to 
ROMIO integration layer sometime in the 4.0 series that fixed a datatype 
problem, which caused some problems in the HDF5 tests. You might be hitting 
that problem.

Thanks
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Monday, November 16, 2020 4:32 AM
To: users@lists.open-mpi.org
Cc: Mark Dixon 
Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and am 
hoping that someone can help.

Back in the openmpi 2.0.0 release notes, it said that OMPIO is the default 
MPI-IO implementation on everything apart from Lustre, where ROMIO is used. 
Those release notes are pretty old, but it still appears to be true.

However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print warning 
messages (UCX_LOG_LEVEL=ERROR).

Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark


Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Mark Dixon via users

Hi Edgar,

Thanks for this - good to know that ompio is an option, despite the 
reference to potential performance issues.


I'm using openmpi 4.0.5 with ucx 1.9.0 and see the hdf5 1.10.7 test 
"testphdf5" timeout (with the timeout set to an hour) using romio. Is it a 
known issue there, please?


When it times out, the last few lines to be printed are these:

Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)
Testing  -- multi-chunk collective chunk io (cchunk3)

The other thing I note is that openmpi doesn't configure romio's lustre 
driver, even when given "--with-lustre". Regardless, I see the same result 
whether or not I add "--with-io-romio-flags=--with-file-system=lustre+ufs"


Cheers,

Mark

On Mon, 16 Nov 2020, Gabriel, Edgar via users wrote:

this is in theory still correct, the default MPI I/O library used by 
Open MPI on Lustre file systems is ROMIO in all release versions. That 
being said, ompio does have support for Lustre as well starting from the 
2.1 series, so you can use that as well. The main reason that we did not 
switch to ompio for Lustre as the default MPI I/O library is a 
performance issue that can arise under certain circumstances.


Which version of Open MPI are you using? There was a bug fix in the Open 
MPI to ROMIO integration layer sometime in the 4.0 series that fixed a 
datatype problem, which caused some problems in the HDF5 tests. You 
might be hitting that problem.


Thanks
Edgar

-Original Message-
From: users  On Behalf Of Mark Dixon via users
Sent: Monday, November 16, 2020 4:32 AM
To: users@lists.open-mpi.org
Cc: Mark Dixon 
Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi all,

I'm confused about how openmpi supports mpi-io on Lustre these days, and 
am hoping that someone can help.


Back in the openmpi 2.0.0 release notes, it said that OMPIO is the 
default MPI-IO implementation on everything apart from Lustre, where 
ROMIO is used. Those release notes are pretty old, but it still appears 
to be true.


However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I tell 
openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to print 
warning messages (UCX_LOG_LEVEL=ERROR).


Can I just check: are we still supposed to be using ROMIO?

Thanks,

Mark



Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

2020-11-16 Thread Gabriel, Edgar via users
hm, I think this sounds like a different issue, somebody who is more invested 
in the ROMIO Open MPI work should probably have a look.

Regarding compiling Open MPI with Lustre support for ROMIO, I cannot test it 
right now for various reasons, but if I recall correctly the trick was to 
provide the --with-lustre option twice, once inside of the 
"--with-io-romio-flags=" (along with the option that you provided), and once 
outside (for ompio).

Thanks
Edgar

-Original Message-
From: Mark Dixon  
Sent: Monday, November 16, 2020 8:19 AM
To: Gabriel, Edgar via users 
Cc: Gabriel, Edgar 
Subject: Re: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?

Hi Edgar,

Thanks for this - good to know that ompio is an option, despite the reference 
to potential performance issues.

I'm using openmpi 4.0.5 with ucx 1.9.0 and see the hdf5 1.10.7 test "testphdf5" 
timeout (with the timeout set to an hour) using romio. Is it a known issue 
there, please?

When it times out, the last few lines to be printed are these:

Testing  -- multi-chunk collective chunk io (cchunk3) Testing  -- multi-chunk 
collective chunk io (cchunk3) Testing  -- multi-chunk collective chunk io 
(cchunk3) Testing  -- multi-chunk collective chunk io (cchunk3) Testing  -- 
multi-chunk collective chunk io (cchunk3) Testing  -- multi-chunk collective 
chunk io (cchunk3)

The other thing I note is that openmpi doesn't configure romio's lustre driver, 
even when given "--with-lustre". Regardless, I see the same result whether or 
not I add "--with-io-romio-flags=--with-file-system=lustre+ufs"

Cheers,

Mark

On Mon, 16 Nov 2020, Gabriel, Edgar via users wrote:

> this is in theory still correct, the default MPI I/O library used by 
> Open MPI on Lustre file systems is ROMIO in all release versions. That 
> being said, ompio does have support for Lustre as well starting from 
> the
> 2.1 series, so you can use that as well. The main reason that we did 
> not switch to ompio for Lustre as the default MPI I/O library is a 
> performance issue that can arise under certain circumstances.
>
> Which version of Open MPI are you using? There was a bug fix in the 
> Open MPI to ROMIO integration layer sometime in the 4.0 series that 
> fixed a datatype problem, which caused some problems in the HDF5 
> tests. You might be hitting that problem.
>
> Thanks
> Edgar
>
> -Original Message-
> From: users  On Behalf Of Mark Dixon 
> via users
> Sent: Monday, November 16, 2020 4:32 AM
> To: users@lists.open-mpi.org
> Cc: Mark Dixon 
> Subject: [OMPI users] MPI-IO on Lustre - OMPIO or ROMIO?
>
> Hi all,
>
> I'm confused about how openmpi supports mpi-io on Lustre these days, 
> and am hoping that someone can help.
>
> Back in the openmpi 2.0.0 release notes, it said that OMPIO is the 
> default MPI-IO implementation on everything apart from Lustre, where 
> ROMIO is used. Those release notes are pretty old, but it still 
> appears to be true.
>
> However, I cannot get HDF5 1.10.7 to pass its MPI-IO tests unless I 
> tell openmpi to use OMPIO (OMPI_MCA_io=ompio) and tell UCX not to 
> print warning messages (UCX_LOG_LEVEL=ERROR).
>
> Can I just check: are we still supposed to be using ROMIO?
>
> Thanks,
>
> Mark
>