Mellanox / LANL -- Can you guys look into this?
Thanks. On Sep 15, 2014, at 12:38 PM, Håkon Bugge <hakon.bu...@gmail.com> wrote: > From time-to-time, and have a need for running Open MPI apps using the openib > btl on a single node, where port 1 on the HCA is connected to port 2 on the > same HCA. > > Using a vintage 1.5.4, my command line would read: > > mpiexec --mca btl self,openib --mca btl_openib_cpc_include oob \ > -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out : \ > -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out > > > Now, I had a need for a newer Open MPI, and compiled and installed version > 1.8.2. Now the problems began ;-) Apparently, the old (and in my opinion > nice)"oob" connection management method has disappeared. However, by > modifying the command line to: > > mpiexec --mca btl self,openib --mca btl_openib_cpc_include udcm \ > -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:1 ./a.out : \ > -np 1 /usr/bin/env OMPI_MCA_btl_openib_if_include=mlx4_0:2 ./a.out > > > I get tons of: > > connect/btl_openib_connect_udcm.c:1390:udcm_find_endpoint] could not find > endpoint with port: 1, lid: 4608, msg_type: 100 > > Interestingly, the lid here is the lid for Port 2 (when port numbers start at > 1). I do suspect that the printout above counts ports from zero. > > Anyway, must I get back to an older Open MPI supporting "oob", or do I have a > flaw in my command line? > > > Thanks, Håkon > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15829.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/