Sorry for sending this to the mailing list instead of personally to Ole!

Jakob

--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark




> On 7 Dec 2021, at 13.13, Jakob Schiøtz <schi...@fysik.dtu.dk> wrote:
> 
> 
> 
> Hej Ole,
> 
> To muligheder:
> 
> 1)
> 
> Prøv at sætte disse tre environment variable:
> 
> export OMPI_MCA_osc='^ucx'
> export OMPI_MCA_pml='^ucx'
> 'export OMPI_MCA_btl='^openib'
> 
> 
> 2)
> 
> Prøv at bygge med
> 
> eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force
> 
> og derefter sætte
> 
> export OMPI_MCA_btl=^openib
> 
> Mvh
> 
> Jakob
> 
> 
> 
> 
>> On 6 Dec 2021, at 15.04, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote:
>> 
>> Hi Bart,
>> 
>> Thanks for your recommendations!  We had already tried this:
>> 
>> export OMPI_MCA_osc='^ucx'
>> export OMPI_MCA_pml='^ucx'
>> 
>> and unfortunately this increased the CPU time of our benchmark code (GPAW) 
>> by about 30% compared to the same compute node without an Omni-Path adapter. 
>>  So this doesn't appear to be a viable solution.
>> 
>> We had also tried to rebuild with:
>> 
>> $ eb --filter-deps=UCX OpenMPI-4.0.5-GCC-10.2.0.eb --force
>> 
>> but then the job error log files had some warnings:
>> 
>>> --------------------------------------------------------------------------
>>> By default, for Open MPI 4.0 and later, infiniband ports on a device
>>> are not used by default.  The intent is to use UCX for these devices.
>>> You can override this policy by setting the btl_openib_allow_ib MCA 
>>> parameter
>>> to true.
>>> Local host:              d063
>>> Local adapter:           hfi1_0
>>> Local port:              1
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> WARNING: There is at least non-excluded one OpenFabrics device found,
>>> but there are no active ports detected (or Open MPI was unable to use
>>> them).  This is most certainly not what you wanted.  Check your
>>> cables, subnet manager configuration, etc.  The openib BTL will be
>>> ignored for this job.
>>> Local host: d063
>>> --------------------------------------------------------------------------
>>> [d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
>>> help-mpi-btl-openib.txt / ib port not selected
>>> [d063.nifl.fysik.dtu.dk:23605] Set MCA parameter "orte_base_help_aggregate" 
>>> to 0 to see all help / error messages
>>> [d063.nifl.fysik.dtu.dk:23605] 55 more processes have sent help message 
>>> help-mpi-btl-openib.txt / no active ports found
>> 
>> These warnings did sound rather bad, so we didn't pursue this approach any 
>> further.
>> 
>> Do you have any other ideas about OMPI_* variables that we could try? Since 
>> I'm not an MPI expert, complete commands and variables would be appreciated 
>> :-)
>> 
>> I would like to remind you that we're running AlmaLinux 8.5 with new 
>> versions of libfabric etc. from the BaseOS.  On CentOS 7.9 we never had any 
>> problems with Omni-Path adapters.
>> 
>> Thanks,
>> Ole
>> 
>> On 12/3/21 15:08, Bart Oldeman wrote:
>>> Hi Ole,
>>> we found that UCX isn't very useful not performant on OmniPath, so if your 
>>> compiled isn't used on both InfiniBand and OmniPath you can compile OpenMPI 
>>> using "eb --filter-deps=UCX ..."
>>> Open MPI works well there either using libpsm2 directly (using the "cm" pml 
>>> and "psm2" mtl), or via libfabric (using the same "cm" pml and the "ofi" 
>>> mtl)
>>> We use the same Open MPI binaries on multiple clusters but set this on 
>>> OmniPath:
>>> OMPI_MCA_btl='^openib'
>>> OMPI_MCA_osc='^ucx'
>>> OMPI_MCA_pml='^ucx'
>>> to disable UCX and openib at runtime. If you include UCX in EB's OpenMPI it 
>>> will not compile in "openib" so the first one of those three would not be 
>>> needed.
>>> Regards,
>>> Bart
>>> On Fri, 3 Dec 2021 at 07:29, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk 
>>> <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
>>>   Hi Åke,
>>>   On 12/3/21 08:27, Åke Sandgren wrote:
>>>>> On 02-12-2021 14:18, Åke Sandgren wrote:
>>>>>> On 12/2/21 2:06 PM, Ole Holm Nielsen wrote:
>>>>>>> These are updated observations of running OpenMPI codes with an
>>>>>>> Omni-Path network fabric on AlmaLinux 8.5::
>>>>>>> 
>>>>>>> Using the foss-2021b toolchain and OpenMPI/4.1.1-GCC-11.2.0 my
>>>   trivial
>>>>>>> MPI test code works correctly:
>>>>>>> 
>>>>>>> $ ml OpenMPI
>>>>>>> $ ml
>>>>>>> 
>>>>>>> Currently Loaded Modules:
>>>>>>>    1) GCCcore/11.2.0                     9)
>>>   hwloc/2.5.0-GCCcore-11.2.0
>>>>>>>    2) zlib/1.2.11-GCCcore-11.2.0        10) OpenSSL/1.1
>>>>>>>    3) binutils/2.37-GCCcore-11.2.0      11)
>>>>>>> libevent/2.1.12-GCCcore-11.2.0
>>>>>>>    4) GCC/11.2.0                        12)
>>>   UCX/1.11.2-GCCcore-11.2.0
>>>>>>>    5) numactl/2.0.14-GCCcore-11.2.0     13)
>>>>>>> libfabric/1.13.2-GCCcore-11.2.0
>>>>>>>    6) XZ/5.2.5-GCCcore-11.2.0           14)
>>>   PMIx/4.1.0-GCCcore-11.2.0
>>>>>>>    7) libxml2/2.9.10-GCCcore-11.2.0     15)
>>>   OpenMPI/4.1.1-GCC-11.2.0
>>>>>>>    8) libpciaccess/0.16-GCCcore-11.2.0
>>>>>>> 
>>>>>>> $ mpicc mpi_test.c
>>>>>>> $ mpirun -n 2 a.out
>>>>>>> 
>>>>>>> (null): There are 2 processes
>>>>>>> 
>>>>>>> (null): Rank  1:  d008
>>>>>>> 
>>>>>>> (null): Rank  0:  d008
>>>>>>> 
>>>>>>> 
>>>>>>> I also tried the OpenMPI/4.1.0-GCC-10.2.0 module, but this still
>>>   gives
>>>>>>> the error messages:
>>>>>>> 
>>>>>>> $ ml OpenMPI/4.1.0-GCC-10.2.0
>>>>>>> $ ml
>>>>>>> 
>>>>>>> Currently Loaded Modules:
>>>>>>>    1) GCCcore/10.2.0               3)
>>>   binutils/2.35-GCCcore-10.2.0   5)
>>>>>>> numactl/2.0.13-GCCcore-10.2.0   7)
>>>   libxml2/2.9.10-GCCcore-10.2.0      9)
>>>>>>> hwloc/2.2.0-GCCcore-10.2.0      11)
>>>   UCX/1.9.0-GCCcore-10.2.0         13)
>>>>>>> PMIx/3.1.5-GCCcore-10.2.0
>>>>>>>    2) zlib/1.2.11-GCCcore-10.2.0   4)
>>>   GCC/10.2.0                     6)
>>>>>>> XZ/5.2.5-GCCcore-10.2.0         8)
>>>   libpciaccess/0.16-GCCcore-10.2.0  10)
>>>>>>> libevent/2.1.12-GCCcore-10.2.0  12)
>>>   libfabric/1.11.0-GCCcore-10.2.0  14)
>>>>>>> OpenMPI/4.1.0-GCC-10.2.0
>>>>>>> 
>>>>>>> $ mpicc mpi_test.c
>>>>>>> $ mpirun -n 2 a.out
>>>>>>> [1638449983.577933] [d008:910356:0]       ib_iface.c:966  UCX  ERROR
>>>>>>> ibv_create_cq(cqe=4096) failed: Operation not supported
>>>>>>> [1638449983.577827] [d008:910355:0]       ib_iface.c:966  UCX  ERROR
>>>>>>> ibv_create_cq(cqe=4096) failed: Operation not supported
>>>>>>> [d008.nifl.fysik.dtu.dk:910355
>>>   <http://d008.nifl.fysik.dtu.dk:910355>] pml_ucx.c:273  Error: Failed
>>>   to create
>>>>>>> UCP worker
>>>>>>> [d008.nifl.fysik.dtu.dk:910356
>>>   <http://d008.nifl.fysik.dtu.dk:910356>] pml_ucx.c:273  Error: Failed
>>>   to create
>>>>>>> UCP worker
>>>>>>> 
>>>>>>> (null): There are 2 processes
>>>>>>> 
>>>>>>> (null): Rank  0:  d008
>>>>>>> 
>>>>>>> (null): Rank  1:  d008
>>>>>>> 
>>>>>>> Conclusion: The foss-2021b toolchain with
>>>   OpenMPI/4.1.1-GCC-11.2.0 seems
>>>>>>> to be required on systems with an Omni-Path network fabric on
>>>   AlmaLinux
>>>>>>> 8.5.  Perhaps the newer UCX/1.11.2-GCCcore-11.2.0 is really what's
>>>>>>> needed, compared to UCX/1.9.0-GCCcore-10.2.0 from foss-2020b.
>>>>>>> 
>>>>>>> Does anyone have comments on this?
>>>>>> 
>>>>>> UCX is the problem here in combination with libfabric I think.
>>>   Write a
>>>>>> hook that upgrades the version of UCX to 1.11-something if it's <
>>>>>> 1.11-ish, or just that specific version if you have older-and-working
>>>>>> versions.
>>>>> 
>>>>> You are right that the nodes with Omni-Path have different libfabric
>>>>> packages which come from the EL8.5 BaseOS as well as the latest
>>>>> Cornelis/Intel Omni-Path drivers:
>>>>> 
>>>>> $ rpm -qa | grep libfabric
>>>>> libfabric-verbs-1.10.0-2.x86_64
>>>>> libfabric-1.12.1-1.el8.x86_64
>>>>> libfabric-devel-1.12.1-1.el8.x86_64
>>>>> libfabric-psm2-1.10.0-2.x86_64
>>>>> 
>>>>> The 1.12 packages are from EL8.5, and 1.10 packages are from Cornelis.
>>>>> 
>>>>> Regarding UCX, I was first using the trusted foss-2020b toolchain
>>>   which
>>>>> includes UCX/1.9.0-GCCcore-10.2.0. I guess that we shouldn't mess with
>>>>> the toolchains?
>>>>> 
>>>>> The foss-2021b toolchain includes the newer UCX 1.11, which seems to
>>>>> solve this particular problem.
>>>>> 
>>>>> Can we make any best practices recommendations from these
>>>   observations?
>>>> 
>>>> I didn't check properly, but UCX does not depend on libfabric, OpenMPI
>>>> does, so I'd write a hook that replaces libfabric < 1.12 with at least
>>>> 1.12.1.
>>>> Sometimes you just have to mess with the toolchains, and this looks
>>>   like
>>>> one of those situations.
>>>> 
>>>> Or as a test build your own OpenMPI-4.1.0 or 4.0.5 (that 2020b uses)
>>>> with an updated libfabric and check if that fixes the problem. And
>>>   if it
>>>> does, write a hook that replaces libfabric. See the framework/contrib
>>>> for examples, I did that for UCX so there is code there to show you
>>>   how.
>>>   I don't feel qualified to mess around with modifying EB toolchains...
>>>   The foss-2021b toolchain including OpenMPI/4.1.1-GCC-11.2.0 seems to
>>>   solve
>>>   the present problem.  Do you think there are any disadvantages with
>>>   asking
>>>   users to go for foss-2021b?  Of course we may need several modules to be
>>>   upgraded from foss-2020b to foss-2021b.
>>>   Another possibility may be the coming driver upgrade from Cornelis
>>>   Networks to support the Omni-Path fabric on EL 8.4 and EL 8.5.  I'm
>>>   definitely going to check this when it becomes available.
>>>   Thanks,
>>>   Ole
>>> -- 
>>> Dr. Bart E. Oldeman | bart.olde...@mcgill.ca 
>>> <mailto:bart.olde...@mcgill.ca> | bart.olde...@calculquebec.ca 
>>> <mailto:bart.olde...@calculquebec.ca>
>>> Scientific Computing Analyst / Analyste en calcul scientifique
>>> McGill HPC Centre / Centre de Calcul Haute Performance de McGill | 
>>> http://www.hpc.mcgill.ca <http://www.hpc.mcgill.ca>
>>> Calcul Québec | http://www.calculquebec.ca <http://www.calculquebec.ca>
>>> Compute/Calcul Canada | http://www.computecanada.ca 
>>> <http://www.computecanada.ca>
>>> Tel/Tél: 514-396-8926 | Fax/Télécopieur: 514-396-8934
>> 
> 

Reply via email to