Ralph,

you are right, this was definetly not the right fix (at least with 4
nodes or more)

i finally understood what is going wrong here :
to make it simple, the allgather recursive doubling algo is not
implemented with
MPI_Recv(...,peer,...) like functions but with
MPI_Recv(...,MPI_ANY_SOURCE,...) like functions
and that makes things slightly more complicated :
right now :
- with two nodes : if node 1 is late, it gets stuck in the allgather
- with four nodes : if node 0 is first, then node 2 and 3 while node 1
is still late, then node 0
will likely leaves the allgather though it did not receive anything
from  node 1
- and so on

i think i can fix that from now

Cheers,

Gilles

On 2014/09/11 23:47, Ralph Castain wrote:
> Yeah, that's not the right fix, I'm afraid. I've made the direct component 
> the default again until I have time to dig into this deeper.
>
> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Ralph,
>>
>> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
>> it does not invoke pmix_server_release
>> because allgather_stub was not previously invoked since the the fence
>> was not yet entered.
>> /* in rcd_finalize_coll, coll->cbfunc is NULL */
>>
>> the attached patch is likely not the right fix, it was very lightly
>> tested, but so far, it works for me ...
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
>>> Ralph,
>>>
>>> things got worst indeed :-(
>>>
>>> now a simple hello world involving two hosts hang in mpi_init.
>>> there is still a race condition : if a tasks a call fence long after task b,
>>> then task b will never leave the fence
>>>
>>> i ll try to debug this ...
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/09/11 2:36, Ralph Castain wrote:
>>>> I think I now have this fixed - let me know what you see.
>>>>
>>>>
>>>> On Sep 9, 2014, at 6:15 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>>> Yeah, that's not the correct fix. The right way to fix it is for all 
>>>>> three components to have their own RML tag, and for each of them to 
>>>>> establish a persistent receive. They then can use the signature to tell 
>>>>> which collective the incoming message belongs to.
>>>>>
>>>>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>>>>
>>>>>
>>>>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@iferc.org> wrote:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>>>>> the attached spawn.c test program is a trimmed version of the
>>>>>> spawn_with_env_vars.c test case
>>>>>> from the ibm test suite.
>>>>>>
>>>>>> when invoked on two nodes :
>>>>>> - the program hangs with -np 2
>>>>>> - the program can crash with np > 2
>>>>>> error message is
>>>>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>>>>> AND TAG -33 - ABORTING
>>>>>>
>>>>>> here is my full command line (from node0) :
>>>>>>
>>>>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>>>>> coll ^ml ./spawn
>>>>>>
>>>>>> a simple workaround is to add the following extra parameter to the
>>>>>> mpirun command line :
>>>>>> --mca grpcomm_rcd_priority 0
>>>>>>
>>>>>> my understanding it that the race condition occurs when all the
>>>>>> processes call MPI_Finalize()
>>>>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>>>>> involving mpirun and orted
>>>>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>>>>> the error message is very explicit : this is not (currently) supported
>>>>>>
>>>>>> i wrote the attached rml.patch which is really a workaround and not a 
>>>>>> fix :
>>>>>> in this case, each job will invoke an ALLGATHER but with a different tag
>>>>>> /* that works for a limited number of jobs only */
>>>>>>
>>>>>> i did not commit this patch since this is not a fix, could someone
>>>>>> (Ralph ?) please review the issue and comment ?
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> <spawn.c><rml.patch>_______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15804.php
>> <rml2.patch>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15805.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15810.php

Reply via email to