Hello,
"Jeff Squyres (jsquyres)" writes:
> I'd recommend against using Open MPI v3.1.0 -- it's quite old. If you
> have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which
> has all the rolled-up bug fixes on the v3.1.x series.
>
> That being said, Open MPI v4.1.2 is the most current. Open MPI v4.1.2 does
> restrict which versions of UCX it uses because there are bugs in the older
> versions of UCX. I am not intimately familiar with UCX -- you'll need to ask
> Nvidia for support there -- but I was under the impression that it's just a
> user-level library, and you could certainly install your own copy of UCX to
> use
> with your compilation of Open MPI. I.e., you're not restricted to whatever
> UCX
> is installed in the cluster system-default locations.
I did follow your advice, so I compiled my own version of UCX (1.11.2)
and OpenMPI v4.1.1, but for some reason the latency / bandwidth numbers
are really bad compared to the previous ones, so something is wrong, but
not sure how to debug it.
> I don't know why you're getting MXM-specific error messages; those don't
> appear
> to be coming from Open MPI (especially since you configured Open MPI with
> --without-mxm). If you can upgrade to Open MPI v4.1.2 and the latest UCX, see
> if you are still getting those MXM error messages.
In this latest attempt, yes, the MXM error messages are still there.
Cheers,
--
Ángel de Vicente
Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no
autorizadas del contenido de este mensaje está estrictamente prohibida. Más
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged
information. If you are not the final recipient or have received it in error,
please notify the sender immediately. Any unauthorized use of the content of
this message is strictly prohibited. More information:
https://www.iac.es/en/disclaimer