Andrej,

what is your mpirun command line?
is mpirun invoked from a batch allocation?

in order to get some more debug info, you can

mpirun --mca ess_base_verbose 10 --mca pmix_base_verbose 10 ...


Cheers,

Gilles

On Mon, Feb 1, 2021 at 10:27 PM Andrej Prsa via devel
<devel@lists.open-mpi.org> wrote:
>
> Hi Gilles,
>
> > I invite you to do some cleanup
> > sudo rm -rf /usr/local/lib/openmpi /usr/local/lib/pmix
> > and then
> > sudo make install
> > and try again.
>
> Good catch! Alright, I deleted /usr/local/lib/openmpi and
> /usr/local/lib/pmix, then I rebuilt (make clean; make) and installed
> pmix from the latest master (should I use 3.1.6 instead?), and rebuilt
> (make clean; make) and installed the debug-enabled version of openmpi.
> Now I'm getting this:
>
> [terra:199344] [[43961,0],0] ORTE_ERROR_LOG: Not found in file
> ess_hnp_module.c at line 320
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>    opal_pmix_base_select failed
>    --> Returned value Not found (-13) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
>
> > if the issue persists, please post the output of the following commands
> > $ env | grep ^OPAL_
> > $ env | grep ^PMIX_
>
> I don't have any env variables defined.
>
> Cheers,
> Andrej
>

Reply via email to