Hi Bart,
Thanks for your recommendations! We had already tried this:
export OMPI_MCA_osc='^ucx'
export OMPI_MCA_pml='^ucx'
and unfortunately this increased the CPU time of our benchmark code (GPAW)
by about 30% compared to the same compute node without an Omni-Path
adapter. So this doesn't appear to be a viable solution.
We had also tried to rebuild with:
$ eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force
but then the job error log files had some warnings:
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: d063
Local adapter: hfi1_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: d063
--------------------------------------------------------------------------
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message
help-mpi-btl-openib.txt / ib port not selected
[d063.nifl.fysik.dtu.dk:23605] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages
[d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message
help-mpi-btl-openib.txt / no active ports found
These warnings did sound rather bad, so we didn't pursue this approach any
further.
Do you have any other ideas about OMPI_* variables that we could try?
Since I'm not an MPI expert, complete commands and variables would be
appreciated :-)
I would like to remind you that we're running AlmaLinux 8.5 with new
versions of libfabric etc. from the BaseOS. On CentOS 7.9 we never had
any problems with Omni-Path adapters.
Thanks,
Ole
On 12/3/21 15:08, Bart Oldeman wrote:
Hi Ole,
we found that UCX isn't very useful not performant on OmniPath, so if your
compiled isn't used on both InfiniBand and OmniPath you can compile
OpenMPI using "eb --filter-deps=UCX ..."
Open MPI works well there either using libpsm2 directly (using the "cm"
pml and "psm2" mtl), or via libfabric (using the same "cm" pml and the
"ofi" mtl)
We use the same Open MPI binaries on multiple clusters but set this on
OmniPath:
OMPI_MCA_btl='^openib'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI
it will not compile in "openib" so the first one of those three would not
be needed.
Regards,
Bart
On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk
<mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Åke,
On 12/3/21 08:27, Åke Sandgren wrote:
>> On 02-12-2021 14:18, Åke Sandgren wrote:
>>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:
>>>> These are updated observations of running OpenMPI codes with an
>>>> Omni-Path network fabric on AlmaLinux 8.5::
>>>>
>>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my
trivial
>>>> MPI test code works correctly:
>>>>
>>>> $ ml OpenMPI
>>>> $ ml
>>>>
>>>> Currently Loaded Modules:
>>>> 1) GCCcore/11.2.0 9)
hwloc/2.5.0-GCCcore-11.2.0
>>>> 2) zlib/1.2.11-GCCcore-11.2.0 10) OpenSSL/1.1
>>>> 3) binutils/2.37-GCCcore-11.2.0 11)
>>>> libevent/2.1.12-GCCcore-11.2.0
>>>> 4) GCC/11.2.0 12)
UCX/1.11.2-GCCcore-11.2.0
>>>> 5) numactl/2.0.14-GCCcore-11.2.0 13)
>>>> libfabric/1.13.2-GCCcore-11.2.0
>>>> 6) XZ/5.2.5-GCCcore-11.2.0 14)
PMIx/4.1.0-GCCcore-11.2.0
>>>> 7) libxml2/2.9.10-GCCcore-11.2.0 15)
OpenMPI/4.1.1-GCC-11.2.0
>>>> 8) libpciaccess/0.16-GCCcore-11.2.0
>>>>
>>>> $ mpicc mpi_test.c
>>>> $ mpirun -n 2 a.out
>>>>
>>>> (null): There are 2 processes
>>>>
>>>> (null): Rank 1: d008
>>>>
>>>> (null): Rank 0: d008
>>>>
>>>>
>>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still
gives
>>>> the error messages:
>>>>
>>>> $ ml OpenMPI/4.1.0-GCC-10.2.0
>>>> $ ml
>>>>
>>>> Currently Loaded Modules:
>>>> 1) GCCcore/10.2.0 3)
binutils/2.35-GCCcore-10.2.0 5)
>>>> numactl/2.0.13-GCCcore-10.2.0 7)
libxml2/2.9.10-GCCcore-10.2.0 9)
>>>> hwloc/2.2.0-GCCcore-10.2.0 11)
UCX/1.9.0-GCCcore-10.2.0 13)
>>>> PMIx/3.1.5-GCCcore-10.2.0
>>>> 2) zlib/1.2.11-GCCcore-10.2.0 4)
GCC/10.2.0 6)
>>>> XZ/5.2.5-GCCcore-10.2.0 8)
libpciaccess/0.16-GCCcore-10.2.0 10)
>>>> libevent/2.1.12-GCCcore-10.2.0 12)
libfabric/1.11.0-GCCcore-10.2.0 14)
>>>> OpenMPI/4.1.0-GCC-10.2.0
>>>>
>>>> $ mpicc mpi_test.c
>>>> $ mpirun -n 2 a.out
>>>> [1638449983.577933] [d008:910356:0] ib_iface.c:966 UCX ERROR
>>>> ibv_create_cq(cqe=4096) failed: Operation not supported
>>>> [1638449983.577827] [d008:910355:0] ib_iface.c:966 UCX ERROR
>>>> ibv_create_cq(cqe=4096) failed: Operation not supported
>>>> [d008.nifl.fysik.dtu.dk:910355
<http://d008.nifl.fysik.dtu.dk:910355>] pml_ucx.c:273 Error: Failed
to create
>>>> UCP worker
>>>> [d008.nifl.fysik.dtu.dk:910356
<http://d008.nifl.fysik.dtu.dk:910356>] pml_ucx.c:273 Error: Failed
to create
>>>> UCP worker
>>>>
>>>> (null): There are 2 processes
>>>>
>>>> (null): Rank 0: d008
>>>>
>>>> (null): Rank 1: d008
>>>>
>>>> Conclusion: The foss-2021b toolchain with
OpenMPI/4.1.1-GCC-11.2.0 seems
>>>> to be required on systems with an Omni-Path network fabric on
AlmaLinux
>>>> 8.5. Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's
>>>> needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b.
>>>>
>>>> Does anyone have comments on this?
>>>
>>> UCX is the problem here in combination with libfabric I think.
Write a
>>> hook that upgrades the version of UCX to 1.11-something if it's <
>>> 1.11-ish, or just that specific version if you have older-and-working
>>> versions.
>>
>> You are right that the nodes with Omni-Path have different libfabric
>> packages which come from the EL8.5 BaseOS as well as the latest
>> Cornelis/Intel Omni-Path drivers:
>>
>> $ rpm -qa | grep libfabric
>> libfabric-verbs-1.10.0-2.x86_64
>> libfabric-1.12.1-1.el8.x86_64
>> libfabric-devel-1.12.1-1.el8.x86_64
>> libfabric-psm2-1.10.0-2.x86_64
>>
>> The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis.
>>
>> Regarding UCX, I was first using the trusted foss-2020b toolchain
which
>> includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with
>> the toolchains?
>>
>> The foss-2021b toolchain includes the newer UCX 1.11, which seems to
>> solve this particular problem.
>>
>> Can we make any best practices recommendations from these
observations?
>
> I didn't check properly, but UCX does not depend on libfabric, OpenMPI
> does, so I'd write a hook that replaces libfabric < 1.12 with at least
> 1.12.1.
> Sometimes you just have to mess with the toolchains, and this looks
like
> one of those situations.
>
> Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses)
> with an updated libfabric and check if that fixes the problem. And
if it
> does, write a hook that replaces libfabric. See the framework/contrib
> for examples, I did that for UCX so there is code there to show you
how.
I don't feel qualified to mess around with modifying EB toolchains...
The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to
solve
the present problem. Do you think there are any disadvantages with
asking
users to go for foss-2021b? Of course we may need several modules to be
upgraded from foss-2020b to foss-2021b.
Another possibility may be the coming driver upgrade from Cornelis
Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5. I'm
definitely going to check this when it becomes available.
Thanks,
Ole
--
Dr. Bart E. Oldeman | bart.olde...@mcgill.ca
<mailto:bart.olde...@mcgill.ca> | bart.olde...@calculquebec.ca
<mailto:bart.olde...@calculquebec.ca>
Scientific Computing Analyst / Analyste en calcul scientifique
McGill HPC Centre / Centre de Calcul Haute Performance de McGill |
http://www.hpc.mcgill.ca <http://www.hpc.mcgill.ca>
Calcul Québec | http://www.calculquebec.ca <http://www.calculquebec.ca>
Compute/Calcul Canada | http://www.computecanada.ca
<http://www.computecanada.ca>
Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934