Mellanox / LANL --

Can you guys look into this?

Thanks.



On Sep 15, 2014, at 12:38 PM, Håkon Bugge <hakon.bu...@gmail.com> wrote:

> From time-to-time, and have a need for running Open MPI apps using the openib 
> btl on a single node, where port 1 on the HCA is connected to port 2 on the 
> same HCA.
> 
> Using a vintage 1.5.4, my command line would read:
> 
> mpiexec --mca btl self,openib --mca btl_openib_cpc_include oob \
>   -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out  : \
>   -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out
> 
> 
> Now, I had a need for a newer Open MPI, and compiled and installed version 
> 1.8.2. Now the problems began ;-) Apparently, the old (and in my opinion 
> nice)"oob" connection management method has disappeared. However, by 
> modifying the command line to:
> 
> mpiexec --mca btl self,openib --mca btl_openib_cpc_include udcm \
>   -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out : \
>   -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out
> 
> 
> I get tons of:
> 
> connect/btl_openib_connect_udcm.c:1390:udcm_find_endpoint] could not find 
> endpoint with port: 1, lid: 4608, msg_type: 100
> 
> Interestingly, the lid here is the lid for Port 2 (when port numbers start at 
> 1). I do suspect that the printout above counts ports from zero.
> 
> Anyway, must I get back to an older Open MPI supporting "oob", or do I have a 
> flaw in my command line?
> 
> 
> Thanks, Håkon
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15829.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to