Took me awhile to track this down, but it is now fixed - combination of several minor errors
Thanks Ralph On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote: > Folks, > > the intercomm_create test case from the ibm test suite can hang under > some configuration. > > basically, it will spawn n tasks in a first communicator, and then n > tasks in a second communicator. > > when i run from node0 : > mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2 > ./intercomm_create > > the second spawn will hang. > a simple workaround is to use 3 hosts : > mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3 > ./intercomm_create > > the second spawn creates the task on node2. > for some reasons i cannot fully understand, pmix believe orted of nodes > node1 and node2 are involved in allgather. > since node1 in not involved whatsoever, the program hangs > /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid) > returns jdata with jdata->map->num_nodes = 2 */ > > Cheers, > > Gilles > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15732.php