On 02/09/21 at 00:16 +0200, Lucas Nussbaum wrote:
> reopen 979041
> notfixed 979041 4.1.0-7
> thanks
> 
> Hi,
> 
> I ran into this problem with 4.1.0-7. The ofi BTL was disabled, but not
> the ofi MTL. In some cases, both need to be disabled.
> 
> You need something like:
> mtl = ^ofi
> in addition to:
> btl = ^ofi
> 
> (I ran into this with https://github.com/LLNL/mpiGraph and
> libhwloc-contrib-plugins, and CUDA installed -- let me know if you need
> help reproducing)

Thinking about this again, I wonder if instead of disabling
OFI/libfabric in OpenMPI, we shouldn't instead disable the EFA provider
in libfabric, since it seems to be the only one causing the error on
fork().

Alternatively, fork() support has been added to the libfabric EFA
provider (https://github.com/ofiwg/libfabric/issues/6332), but this is
unlikely to be accepted in a stable update.

Given that OFI was not really disabled in OpenMPI anyway, keeping it
enabled and disabling EFA in libfabric is actually a minor change.

For context, EFA is an AWS-specific provider, see
https://www.hpcworkshops.com/07-efa/00-efa-basics.html

Lucas

Reply via email to