This happens because we do not currently have a way to detect connectivity without allocating ompi_proc_t's for every rank in the window. I added the osc_rdma_btls MCA variable to act as a short-circuit that avoids the costly connectivity lookup. By default the value is ugni,openib. You can set it to the empty string to force it to check connectivity.
This will be in 2.x once the mlx5 fix is in. I can update the check to do an allreduce to ensure all processes in the window select the same btl. I do not, however, want to change the default value of osc_rdma_btls since it is there to ensure performance and reduce the memory footprint on heterogenous clusters. -Nathan On Sun, Nov 15, 2015 at 10:34:45AM +0900, Gilles Gouaillardet wrote: > Howard, > there is no rdma osc component in v2.x, so I doubt the issue occurs here. > I will double check this anyway on Monday > Cheers, > Gilles > > On Sunday, November 15, 2015, Howard <hpprit...@gmail.com> wrote: > > Hi Gilles > > Could you check whether you also see this problem with v2.x? > > Thanks, > > Howard > > Von meinem iPhone gesendet > > > Am 10.11.2015 um 19:57 schrieb Gilles Gouaillardet > <gil...@rist.or.jp>: > > > > Nathan, > > > > a simple MPI_Win_create test hangs on my non uniform cluster > (ibm/onesided/c_create) > > > > one node has an IB card but not the other one. > > the node with the IB card select the rdma osc module, but the other > node select the pt2pt module. > > and then it hangs because both ends do no try to initialize the same > module > > > > if i understand correctly, the rdma osc component is selected if at > least a rdma capable btl is initialized, > > imho, the logic should be : > > the rdma osc component is selected for a given communicator if all the > btls involved in this communicator > > (maybe except the self btl) are rdma capable. > > > > can you please have a look at this ? > > > > Cheers, > > > > Gilles > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/11/18356.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/11/18370.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/11/18371.php
pgpDriWPbNRTc.pgp
Description: PGP signature