On 02/09/21 at 00:16 +0200, Lucas Nussbaum wrote: > reopen 979041 > notfixed 979041 4.1.0-7 > thanks > > Hi, > > I ran into this problem with 4.1.0-7. The ofi BTL was disabled, but not > the ofi MTL. In some cases, both need to be disabled. > > You need something like: > mtl = ^ofi > in addition to: > btl = ^ofi > > (I ran into this with https://github.com/LLNL/mpiGraph and > libhwloc-contrib-plugins, and CUDA installed -- let me know if you need > help reproducing)
Thinking about this again, I wonder if instead of disabling OFI/libfabric in OpenMPI, we shouldn't instead disable the EFA provider in libfabric, since it seems to be the only one causing the error on fork(). Alternatively, fork() support has been added to the libfabric EFA provider (https://github.com/ofiwg/libfabric/issues/6332), but this is unlikely to be accepted in a stable update. Given that OFI was not really disabled in OpenMPI anyway, keeping it enabled and disabling EFA in libfabric is actually a minor change. For context, EFA is an AWS-specific provider, see https://www.hpcworkshops.com/07-efa/00-efa-basics.html Lucas