Hi Pritchard, thank you for replying.

Nothing changed adding the parameter you suggested. Can it depend on the
fact that I'm running v.1.10.0rc7? It's a custom version, we didn't modify
spml or sm related code though.

2016-11-15 14:12 GMT+01:00 Pritchard Jr., Howard <howa...@lanl.gov>:

> HI Gianmario,
>
> Probably something went wrong at the spml layer.
> Could you also add —mac spml_base_verbose 10
> to the job launch line?
>
> Howard
>
> --
> Howard Pritchard
> HPC-DES
> Los Alamos National Laboratory
>
>
> From: devel <devel-boun...@lists.open-mpi.org> on behalf of Gianmario
> Pozzi <pozzigma...@gmail.com>
> Reply-To: Open MPI Developers <devel@lists.open-mpi.org>
> Date: Tuesday, November 15, 2016 at 5:32 AM
> To: "devel@lists.open-mpi.org" <devel@lists.open-mpi.org>
> Subject: [OMPI devel] Failure while loading shmem module
>
> Hi everybody,
>
> I'm trying to run a sample program on two 16-cores machines connected with
> IB (command:  mpirun -np 20 -host *localhost*,*remotehost* --mca
> shmem_base_verbose 10 --mca btl self,sm,openib test).
>
> This command fails saying:
>
> [cn18:72296] mca: base: components_register: registering shmem components
> [cn18:72296] mca: base: components_open: opening shmem components
> [cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components
> [cn18:72296] shmem: base: runtime_query: (shmem) No component selected!
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
>
> I dove into the code and found out that the cycle contained in that
> function is not traversed, which apparently means that no suitable
> component has even been found.
>
> Please notice that a sample Hello world application using shared memory
> runs perfectly. Excluding sm from command line doesn't solve the problem.
>
> Any hint? Did any of y'all ever experienced something similar?
>
> Thank you.
> --
> *Gianmario Pozzi*
> *M.Sc. @ Politecnico di Milano*
>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
*Gianmario Pozzi*
*M.Sc. @ Politecnico di Milano*
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to