Took me awhile to track this down, but it is now fixed - combination of several 
minor errors

Thanks
Ralph

On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet 
<gilles.gouaillar...@iferc.org> wrote:

> Folks,
> 
> the intercomm_create test case from the ibm test suite can hang under
> some configuration.
> 
> basically, it will spawn n tasks in a first communicator, and then n
> tasks in a second communicator.
> 
> when i run from node0 :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
> ./intercomm_create
> 
> the second spawn will hang.
> a simple workaround is to use 3 hosts :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
> ./intercomm_create
> 
> the second spawn creates the task on node2.
> for some reasons i cannot fully understand, pmix believe orted of nodes
> node1 and node2 are involved in allgather.
> since node1 in not involved whatsoever, the program hangs
> /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
> returns jdata with jdata->map->num_nodes = 2 */
> 
> Cheers,
> 
> Gilles
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15732.php

Reply via email to