Hi Pritchard, thank you for replying.

Nothing changed adding the parameter you suggested. Can it depend on the
fact that I'm running v.1.10.0rc7? It's a custom version, we didn't modify
spml or sm related code though.

2016-11-15 14:12 GMT+01:00 Pritchard Jr., Howard <[email protected]>:

> HI Gianmario,
>
> Probably something went wrong at the spml layer.
> Could you also add —mac spml_base_verbose 10
> to the job launch line?
>
> Howard
>
> --
> Howard Pritchard
> HPC-DES
> Los Alamos National Laboratory
>
>
> From: devel <[email protected]> on behalf of Gianmario
> Pozzi <[email protected]>
> Reply-To: Open MPI Developers <[email protected]>
> Date: Tuesday, November 15, 2016 at 5:32 AM
> To: "[email protected]" <[email protected]>
> Subject: [OMPI devel] Failure while loading shmem module
>
> Hi everybody,
>
> I'm trying to run a sample program on two 16-cores machines connected with
> IB (command:  mpirun -np 20 -host *localhost*,*remotehost* --mca
> shmem_base_verbose 10 --mca btl self,sm,openib test).
>
> This command fails saying:
>
> [cn18:72296] mca: base: components_register: registering shmem components
> [cn18:72296] mca: base: components_open: opening shmem components
> [cn18:72296] shmem: base: runtime_query: Auto-selecting shmem components
> [cn18:72296] shmem: base: runtime_query: (shmem) No component selected!
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
>
> I dove into the code and found out that the cycle contained in that
> function is not traversed, which apparently means that no suitable
> component has even been found.
>
> Please notice that a sample Hello world application using shared memory
> runs perfectly. Excluding sm from command line doesn't solve the problem.
>
> Any hint? Did any of y'all ever experienced something similar?
>
> Thank you.
> --
> *Gianmario Pozzi*
> *M.Sc. @ Politecnico di Milano*
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
*Gianmario Pozzi*
*M.Sc. @ Politecnico di Milano*
_______________________________________________
devel mailing list
[email protected]
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to