Andrej, what is your mpirun command line? is mpirun invoked from a batch allocation?
in order to get some more debug info, you can mpirun --mca ess_base_verbose 10 --mca pmix_base_verbose 10 ... Cheers, Gilles On Mon, Feb 1, 2021 at 10:27 PM Andrej Prsa via devel <devel@lists.open-mpi.org> wrote: > > Hi Gilles, > > > I invite you to do some cleanup > > sudo rm -rf /usr/local/lib/openmpi /usr/local/lib/pmix > > and then > > sudo make install > > and try again. > > Good catch! Alright, I deleted /usr/local/lib/openmpi and > /usr/local/lib/pmix, then I rebuilt (make clean; make) and installed > pmix from the latest master (should I use 3.1.6 instead?), and rebuilt > (make clean; make) and installed the debug-enabled version of openmpi. > Now I'm getting this: > > [terra:199344] [[43961,0],0] ORTE_ERROR_LOG: Not found in file > ess_hnp_module.c at line 320 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_pmix_base_select failed > --> Returned value Not found (-13) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > > > if the issue persists, please post the output of the following commands > > $ env | grep ^OPAL_ > > $ env | grep ^PMIX_ > > I don't have any env variables defined. > > Cheers, > Andrej >