Hi Bart,

Thanks for your recommendations!  We had already tried this:

export OMPI_MCA_osc='^ucx'
export OMPI_MCA_pml='^ucx'

and unfortunately this increased the CPU time of our benchmark code (GPAW) by about 30% compared to the same compute node without an Omni-Path adapter. So this doesn't appear to be a viable solution.

We had also tried to rebuild with:

$ eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force

but then the job error log files had some warnings:

--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:              d063
  Local adapter:           hfi1_0
  Local port:              1

--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: d063
--------------------------------------------------------------------------
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
help-mpi-btl-openib.txt / ib port not selected
[d063.nifl.fysik.dtu.dk:23605] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
help-mpi-btl-openib.txt / no active ports found

These warnings did sound rather bad, so we didn't pursue this approach any further.

Do you have any other ideas about OMPI_* variables that we could try? Since I'm not an MPI expert, complete commands and variables would be appreciated :-)

I would like to remind you that we're running AlmaLinux 8.5 with new versions of libfabric etc. from the BaseOS. On CentOS 7.9 we never had any problems with Omni-Path adapters.

Thanks,
Ole

On 12/3/21 15:08, Bart Oldeman wrote:
Hi Ole,

we found that UCX isn't very useful not performant on OmniPath, so if your compiled isn't used on both InfiniBand and OmniPath you can compile OpenMPI using "eb --filter-deps=UCX ..." Open MPI works well there either using libpsm2 directly (using the "cm" pml and "psm2" mtl), or via libfabric (using the same "cm" pml and the "ofi" mtl)

We use the same Open MPI binaries on multiple clusters but set this on OmniPath:
OMPI_MCA_btl='^openib'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI it will not compile in "openib" so the first one of those three would not be needed.

Regards,
Bart

On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:

    Hi Åke,

    On 12/3/21 08:27, Åke Sandgren wrote:
     >> On 02-12-2021 14:18, Åke Sandgren wrote:
     >>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:
     >>>> These are updated observations of running OpenMPI codes with an
     >>>> Omni-Path network fabric on AlmaLinux 8.5::
     >>>>
     >>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my
    trivial
     >>>> MPI test code works correctly:
     >>>>
     >>>> $ ml OpenMPI
     >>>> $ ml
     >>>>
     >>>> Currently Loaded Modules:
     >>>>     1) GCCcore/11.2.0                     9)
    hwloc/2.5.0-GCCcore-11.2.0
     >>>>     2) zlib/1.2.11-GCCcore-11.2.0        10) OpenSSL/1.1
     >>>>     3) binutils/2.37-GCCcore-11.2.0      11)
     >>>> libevent/2.1.12-GCCcore-11.2.0
     >>>>     4) GCC/11.2.0                        12)
    UCX/1.11.2-GCCcore-11.2.0
     >>>>     5) numactl/2.0.14-GCCcore-11.2.0     13)
     >>>> libfabric/1.13.2-GCCcore-11.2.0
     >>>>     6) XZ/5.2.5-GCCcore-11.2.0           14)
    PMIx/4.1.0-GCCcore-11.2.0
     >>>>     7) libxml2/2.9.10-GCCcore-11.2.0     15)
    OpenMPI/4.1.1-GCC-11.2.0
     >>>>     8) libpciaccess/0.16-GCCcore-11.2.0
     >>>>
     >>>> $ mpicc mpi_test.c
     >>>> $ mpirun -n 2 a.out
     >>>>
     >>>> (null): There are 2 processes
     >>>>
     >>>> (null): Rank  1:  d008
     >>>>
     >>>> (null): Rank  0:  d008
     >>>>
     >>>>
     >>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still
    gives
     >>>> the error messages:
     >>>>
     >>>> $ ml OpenMPI/4.1.0-GCC-10.2.0
     >>>> $ ml
     >>>>
     >>>> Currently Loaded Modules:
     >>>>     1) GCCcore/10.2.0               3)
    binutils/2.35-GCCcore-10.2.0   5)
     >>>> numactl/2.0.13-GCCcore-10.2.0   7)
    libxml2/2.9.10-GCCcore-10.2.0      9)
     >>>> hwloc/2.2.0-GCCcore-10.2.0      11)
    UCX/1.9.0-GCCcore-10.2.0         13)
     >>>> PMIx/3.1.5-GCCcore-10.2.0
     >>>>     2) zlib/1.2.11-GCCcore-10.2.0   4)
    GCC/10.2.0                     6)
     >>>> XZ/5.2.5-GCCcore-10.2.0         8)
    libpciaccess/0.16-GCCcore-10.2.0  10)
     >>>> libevent/2.1.12-GCCcore-10.2.0  12)
    libfabric/1.11.0-GCCcore-10.2.0  14)
     >>>> OpenMPI/4.1.0-GCC-10.2.0
     >>>>
     >>>> $ mpicc mpi_test.c
     >>>> $ mpirun -n 2 a.out
     >>>> [1638449983.577933] [d008:910356:0]       ib_iface.c:966  UCX  ERROR
     >>>> ibv_create_cq(cqe=4096) failed: Operation not supported
     >>>> [1638449983.577827] [d008:910355:0]       ib_iface.c:966  UCX  ERROR
     >>>> ibv_create_cq(cqe=4096) failed: Operation not supported
     >>>> [d008.nifl.fysik.dtu.dk:910355
    <http://d008.nifl.fysik.dtu.dk:910355>] pml_ucx.c:273  Error: Failed
    to create
     >>>> UCP worker
     >>>> [d008.nifl.fysik.dtu.dk:910356
    <http://d008.nifl.fysik.dtu.dk:910356>] pml_ucx.c:273  Error: Failed
    to create
     >>>> UCP worker
     >>>>
     >>>> (null): There are 2 processes
     >>>>
     >>>> (null): Rank  0:  d008
     >>>>
     >>>> (null): Rank  1:  d008
     >>>>
     >>>> Conclusion: The foss-2021b toolchain with
    OpenMPI/4.1.1-GCC-11.2.0 seems
     >>>> to be required on systems with an Omni-Path network fabric on
    AlmaLinux
     >>>> 8.5.  Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's
     >>>> needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b.
     >>>>
     >>>> Does anyone have comments on this?
     >>>
     >>> UCX is the problem here in combination with libfabric I think.
    Write a
     >>> hook that upgrades the version of UCX to 1.11-something if it's <
     >>> 1.11-ish, or just that specific version if you have older-and-working
     >>> versions.
     >>
     >> You are right that the nodes with Omni-Path have different libfabric
     >> packages which come from the EL8.5 BaseOS as well as the latest
     >> Cornelis/Intel Omni-Path drivers:
     >>
     >> $ rpm -qa | grep libfabric
     >> libfabric-verbs-1.10.0-2.x86_64
     >> libfabric-1.12.1-1.el8.x86_64
     >> libfabric-devel-1.12.1-1.el8.x86_64
     >> libfabric-psm2-1.10.0-2.x86_64
     >>
     >> The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis.
     >>
     >> Regarding UCX, I was first using the trusted foss-2020b toolchain
    which
     >> includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with
     >> the toolchains?
     >>
     >> The foss-2021b toolchain includes the newer UCX 1.11, which seems to
     >> solve this particular problem.
     >>
     >> Can we make any best practices recommendations from these
    observations?
     >
     > I didn't check properly, but UCX does not depend on libfabric, OpenMPI
     > does, so I'd write a hook that replaces libfabric < 1.12 with at least
     > 1.12.1.
     > Sometimes you just have to mess with the toolchains, and this looks
    like
     > one of those situations.
     >
     > Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses)
     > with an updated libfabric and check if that fixes the problem. And
    if it
     > does, write a hook that replaces libfabric. See the framework/contrib
     > for examples, I did that for UCX so there is code there to show you
    how.

    I don't feel qualified to mess around with modifying EB toolchains...

    The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to
    solve
    the present problem.  Do you think there are any disadvantages with
    asking
    users to go for foss-2021b?  Of course we may need several modules to be
    upgraded from foss-2020b to foss-2021b.

    Another possibility may be the coming driver upgrade from Cornelis
    Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5.  I'm
    definitely going to check this when it becomes available.

    Thanks,
    Ole



--
Dr. Bart E. Oldeman | bart.olde...@mcgill.ca <mailto:bart.olde...@mcgill.ca> | bart.olde...@calculquebec.ca <mailto:bart.olde...@calculquebec.ca>
Scientific Computing Analyst / Analyste en calcul scientifique
McGill HPC Centre / Centre de Calcul Haute Performance de McGill | http://www.hpc.mcgill.ca <http://www.hpc.mcgill.ca>
Calcul Québec | http://www.calculquebec.ca <http://www.calculquebec.ca>
Compute/Calcul Canada | http://www.computecanada.ca <http://www.computecanada.ca>
Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934

Reply via email to