Sorry for sending this to the mailing list instead of personally to Ole! Jakob
-- Jakob Schiøtz, professor, Ph.D. Department of Physics Technical University of Denmark DK-2800 Kongens Lyngby, Denmark > On 7 Dec 2021, at 13.13, Jakob Schiøtz <schi...@fysik.dtu.dk> wrote: > > > > Hej Ole, > > To muligheder: > > 1) > > Prøv at sætte disse tre environment variable: > > export OMPI_MCA_osc='^ucx' > export OMPI_MCA_pml='^ucx' > 'export OMPI_MCA_btl='^openib' > > > 2) > > Prøv at bygge med > > eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force > > og derefter sætte > > export OMPI_MCA_btl=^openib > > Mvh > > Jakob > > > > >> On 6 Dec 2021, at 15.04, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: >> >> Hi Bart, >> >> Thanks for your recommendations! We had already tried this: >> >> export OMPI_MCA_osc='^ucx' >> export OMPI_MCA_pml='^ucx' >> >> and unfortunately this increased the CPU time of our benchmark code (GPAW) >> by about 30% compared to the same compute node without an Omni-Path adapter. >> So this doesn't appear to be a viable solution. >> >> We had also tried to rebuild with: >> >> $ eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force >> >> but then the job error log files had some warnings: >> >>> -------------------------------------------------------------------------- >>> By default, for Open MPI 4.0 and later, infiniband ports on a device >>> are not used by default. The intent is to use UCX for these devices. >>> You can override this policy by setting the btl_openib_allow_ib MCA >>> parameter >>> to true. >>> Local host: d063 >>> Local adapter: hfi1_0 >>> Local port: 1 >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> WARNING: There is at least non-excluded one OpenFabrics device found, >>> but there are no active ports detected (or Open MPI was unable to use >>> them). This is most certainly not what you wanted. Check your >>> cables, subnet manager configuration, etc. The openib BTL will be >>> ignored for this job. >>> Local host: d063 >>> -------------------------------------------------------------------------- >>> [d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message >>> help-mpi-btl-openib.txt / ib port not selected >>> [d063.nifl.fysik.dtu.dk:23605] Set MCA parameter "orte_base_help_aggregate" >>> to 0 to see all help / error messages >>> [d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message >>> help-mpi-btl-openib.txt / no active ports found >> >> These warnings did sound rather bad, so we didn't pursue this approach any >> further. >> >> Do you have any other ideas about OMPI_* variables that we could try? Since >> I'm not an MPI expert, complete commands and variables would be appreciated >> :-) >> >> I would like to remind you that we're running AlmaLinux 8.5 with new >> versions of libfabric etc. from the BaseOS. On CentOS 7.9 we never had any >> problems with Omni-Path adapters. >> >> Thanks, >> Ole >> >> On 12/3/21 15:08, Bart Oldeman wrote: >>> Hi Ole, >>> we found that UCX isn't very useful not performant on OmniPath, so if your >>> compiled isn't used on both InfiniBand and OmniPath you can compile OpenMPI >>> using "eb --filter-deps=UCX ..." >>> Open MPI works well there either using libpsm2 directly (using the "cm" pml >>> and "psm2" mtl), or via libfabric (using the same "cm" pml and the "ofi" >>> mtl) >>> We use the same Open MPI binaries on multiple clusters but set this on >>> OmniPath: >>> OMPI_MCA_btl='^openib' >>> OMPI_MCA_osc='^ucx' >>> OMPI_MCA_pml='^ucx' >>> to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI it >>> will not compile in "openib" so the first one of those three would not be >>> needed. >>> Regards, >>> Bart >>> On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk >>> <mailto:ole.h.niel...@fysik.dtu.dk>> wrote: >>> Hi Åke, >>> On 12/3/21 08:27, Åke Sandgren wrote: >>>>> On 02-12-2021 14:18, Åke Sandgren wrote: >>>>>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote: >>>>>>> These are updated observations of running OpenMPI codes with an >>>>>>> Omni-Path network fabric on AlmaLinux 8.5:: >>>>>>> >>>>>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my >>> trivial >>>>>>> MPI test code works correctly: >>>>>>> >>>>>>> $ ml OpenMPI >>>>>>> $ ml >>>>>>> >>>>>>> Currently Loaded Modules: >>>>>>> 1) GCCcore/11.2.0 9) >>> hwloc/2.5.0-GCCcore-11.2.0 >>>>>>> 2) zlib/1.2.11-GCCcore-11.2.0 10) OpenSSL/1.1 >>>>>>> 3) binutils/2.37-GCCcore-11.2.0 11) >>>>>>> libevent/2.1.12-GCCcore-11.2.0 >>>>>>> 4) GCC/11.2.0 12) >>> UCX/1.11.2-GCCcore-11.2.0 >>>>>>> 5) numactl/2.0.14-GCCcore-11.2.0 13) >>>>>>> libfabric/1.13.2-GCCcore-11.2.0 >>>>>>> 6) XZ/5.2.5-GCCcore-11.2.0 14) >>> PMIx/4.1.0-GCCcore-11.2.0 >>>>>>> 7) libxml2/2.9.10-GCCcore-11.2.0 15) >>> OpenMPI/4.1.1-GCC-11.2.0 >>>>>>> 8) libpciaccess/0.16-GCCcore-11.2.0 >>>>>>> >>>>>>> $ mpicc mpi_test.c >>>>>>> $ mpirun -n 2 a.out >>>>>>> >>>>>>> (null): There are 2 processes >>>>>>> >>>>>>> (null): Rank 1: d008 >>>>>>> >>>>>>> (null): Rank 0: d008 >>>>>>> >>>>>>> >>>>>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still >>> gives >>>>>>> the error messages: >>>>>>> >>>>>>> $ ml OpenMPI/4.1.0-GCC-10.2.0 >>>>>>> $ ml >>>>>>> >>>>>>> Currently Loaded Modules: >>>>>>> 1) GCCcore/10.2.0 3) >>> binutils/2.35-GCCcore-10.2.0 5) >>>>>>> numactl/2.0.13-GCCcore-10.2.0 7) >>> libxml2/2.9.10-GCCcore-10.2.0 9) >>>>>>> hwloc/2.2.0-GCCcore-10.2.0 11) >>> UCX/1.9.0-GCCcore-10.2.0 13) >>>>>>> PMIx/3.1.5-GCCcore-10.2.0 >>>>>>> 2) zlib/1.2.11-GCCcore-10.2.0 4) >>> GCC/10.2.0 6) >>>>>>> XZ/5.2.5-GCCcore-10.2.0 8) >>> libpciaccess/0.16-GCCcore-10.2.0 10) >>>>>>> libevent/2.1.12-GCCcore-10.2.0 12) >>> libfabric/1.11.0-GCCcore-10.2.0 14) >>>>>>> OpenMPI/4.1.0-GCC-10.2.0 >>>>>>> >>>>>>> $ mpicc mpi_test.c >>>>>>> $ mpirun -n 2 a.out >>>>>>> [1638449983.577933] [d008:910356:0] ib_iface.c:966 UCX ERROR >>>>>>> ibv_create_cq(cqe=4096) failed: Operation not supported >>>>>>> [1638449983.577827] [d008:910355:0] ib_iface.c:966 UCX ERROR >>>>>>> ibv_create_cq(cqe=4096) failed: Operation not supported >>>>>>> [d008.nifl.fysik.dtu.dk:910355 >>> <http://d008.nifl.fysik.dtu.dk:910355>] pml_ucx.c:273 Error: Failed >>> to create >>>>>>> UCP worker >>>>>>> [d008.nifl.fysik.dtu.dk:910356 >>> <http://d008.nifl.fysik.dtu.dk:910356>] pml_ucx.c:273 Error: Failed >>> to create >>>>>>> UCP worker >>>>>>> >>>>>>> (null): There are 2 processes >>>>>>> >>>>>>> (null): Rank 0: d008 >>>>>>> >>>>>>> (null): Rank 1: d008 >>>>>>> >>>>>>> Conclusion: The foss-2021b toolchain with >>> OpenMPI/4.1.1-GCC-11.2.0 seems >>>>>>> to be required on systems with an Omni-Path network fabric on >>> AlmaLinux >>>>>>> 8.5. Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's >>>>>>> needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b. >>>>>>> >>>>>>> Does anyone have comments on this? >>>>>> >>>>>> UCX is the problem here in combination with libfabric I think. >>> Write a >>>>>> hook that upgrades the version of UCX to 1.11-something if it's < >>>>>> 1.11-ish, or just that specific version if you have older-and-working >>>>>> versions. >>>>> >>>>> You are right that the nodes with Omni-Path have different libfabric >>>>> packages which come from the EL8.5 BaseOS as well as the latest >>>>> Cornelis/Intel Omni-Path drivers: >>>>> >>>>> $ rpm -qa | grep libfabric >>>>> libfabric-verbs-1.10.0-2.x86_64 >>>>> libfabric-1.12.1-1.el8.x86_64 >>>>> libfabric-devel-1.12.1-1.el8.x86_64 >>>>> libfabric-psm2-1.10.0-2.x86_64 >>>>> >>>>> The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis. >>>>> >>>>> Regarding UCX, I was first using the trusted foss-2020b toolchain >>> which >>>>> includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with >>>>> the toolchains? >>>>> >>>>> The foss-2021b toolchain includes the newer UCX 1.11, which seems to >>>>> solve this particular problem. >>>>> >>>>> Can we make any best practices recommendations from these >>> observations? >>>> >>>> I didn't check properly, but UCX does not depend on libfabric, OpenMPI >>>> does, so I'd write a hook that replaces libfabric < 1.12 with at least >>>> 1.12.1. >>>> Sometimes you just have to mess with the toolchains, and this looks >>> like >>>> one of those situations. >>>> >>>> Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses) >>>> with an updated libfabric and check if that fixes the problem. And >>> if it >>>> does, write a hook that replaces libfabric. See the framework/contrib >>>> for examples, I did that for UCX so there is code there to show you >>> how. >>> I don't feel qualified to mess around with modifying EB toolchains... >>> The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to >>> solve >>> the present problem. Do you think there are any disadvantages with >>> asking >>> users to go for foss-2021b? Of course we may need several modules to be >>> upgraded from foss-2020b to foss-2021b. >>> Another possibility may be the coming driver upgrade from Cornelis >>> Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5. I'm >>> definitely going to check this when it becomes available. >>> Thanks, >>> Ole >>> -- >>> Dr. Bart E. Oldeman | bart.olde...@mcgill.ca >>> <mailto:bart.olde...@mcgill.ca> | bart.olde...@calculquebec.ca >>> <mailto:bart.olde...@calculquebec.ca> >>> Scientific Computing Analyst / Analyste en calcul scientifique >>> McGill HPC Centre / Centre de Calcul Haute Performance de McGill | >>> http://www.hpc.mcgill.ca <http://www.hpc.mcgill.ca> >>> Calcul Québec | http://www.calculquebec.ca <http://www.calculquebec.ca> >>> Compute/Calcul Canada | http://www.computecanada.ca >>> <http://www.computecanada.ca> >>> Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934 >> >