Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-28 Thread Gilles Gouaillardet
Thanks Ralph !

Cheers,

Gilles

On 2014/08/28 4:52, Ralph Castain wrote:
> Took me awhile to track this down, but it is now fixed - combination of 
> several minor errors
>
> Thanks
> Ralph
>
> On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet 
>  wrote:
>
>> Folks,
>>
>> the intercomm_create test case from the ibm test suite can hang under
>> some configuration.
>>
>> basically, it will spawn n tasks in a first communicator, and then n
>> tasks in a second communicator.
>>
>> when i run from node0 :
>> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
>> ./intercomm_create
>>
>> the second spawn will hang.
>> a simple workaround is to use 3 hosts :
>> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
>> ./intercomm_create
>>
>> the second spawn creates the task on node2.
>> for some reasons i cannot fully understand, pmix believe orted of nodes
>> node1 and node2 are involved in allgather.
>> since node1 in not involved whatsoever, the program hangs
>> /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
>> returns jdata with jdata->map->num_nodes = 2 */
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15732.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15743.php



Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Ralph Castain
Took me awhile to track this down, but it is now fixed - combination of several 
minor errors

Thanks
Ralph

On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet 
 wrote:

> Folks,
> 
> the intercomm_create test case from the ibm test suite can hang under
> some configuration.
> 
> basically, it will spawn n tasks in a first communicator, and then n
> tasks in a second communicator.
> 
> when i run from node0 :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
> ./intercomm_create
> 
> the second spawn will hang.
> a simple workaround is to use 3 hosts :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
> ./intercomm_create
> 
> the second spawn creates the task on node2.
> for some reasons i cannot fully understand, pmix believe orted of nodes
> node1 and node2 are involved in allgather.
> since node1 in not involved whatsoever, the program hangs
> /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
> returns jdata with jdata->map->num_nodes = 2 */
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15732.php



[OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Gilles Gouaillardet
Folks,

the intercomm_create test case from the ibm test suite can hang under
some configuration.

basically, it will spawn n tasks in a first communicator, and then n
tasks in a second communicator.

when i run from node0 :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
./intercomm_create

the second spawn will hang.
a simple workaround is to use 3 hosts :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
./intercomm_create

the second spawn creates the task on node2.
for some reasons i cannot fully understand, pmix believe orted of nodes
node1 and node2 are involved in allgather.
since node1 in not involved whatsoever, the program hangs
/* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
returns jdata with jdata->map->num_nodes = 2 */

Cheers,

Gilles