Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Ralph Castain
And that was indeed the problem - fixed, and now the trunk runs clean thru my 
MTT.

Thanks again!
Ralph

On Aug 25, 2014, at 7:38 AM, Ralph Castain  wrote:

> Yeah, that was going to be my first place to look once I finished breakfast 
> :-)
> 
> Thanks!
> Ralph
> 
> On Aug 25, 2014, at 7:32 AM, Gilles Gouaillardet 
>  wrote:
> 
>> Thanks for the explanation
>> 
>> In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by 
>> sizeof(opal_identifier_t).
>> 
>> Being afk, I could not test but that looks like a good suspect
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Ralph Castain  wrote:
>>> Each collective is given a "signature" that is just the array of names for 
>>> all procs involved in the collective. Thus, even though task 0 is involved 
>>> in both of the disconnect barriers, the two collectives should be running 
>>> in isolation from each other.
>>> 
>>> The "tags" are just receive callbacks and have no meaning other than to 
>>> associate a particular callback to a given send/recv pair. It is the 
>>> signature that counts as the daemons are using that to keep the various 
>>> collectives separated.
>>> 
>>> I'll have to take a look at why task 2 is leaving early. The key will be to 
>>> look at that signature to ensure we aren't getting it confused.
>>> 
>>> On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet 
>>>  wrote:
>>> 
 Folks,
 
 when i run
 mpirun -np 1 ./intercomm_create
 from the ibm test suite, it either :
 - success
 - hangs
 - mpirun crashes (SIGSEGV) soon after writing the following message
 ORTE_ERROR_LOG: Not found in file
 ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566
 
 here is what happens :
 
 first, the test program itself :
 task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and
 parent on task 1
 then
 task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and
 parent on task 2
 then
 several operations (merge, barrier, ...)
 and then without any synchronization :
 - task 0 MPI_Comm_disconnect(ab_inter) and then
 MPI_Comm_disconnect(ac_inter)
 - task 1 and task 2 MPI_Comm_disconnect(parent)
 
 i applied the attached pmix_debug.patch and ran
 mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create
 
 basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0
 and 2 execute a native fence.
 they both use the *same* tags on different though overlapping tasks
 bottom line, task 2 leave the fences *before* task 0 enterred the fence
 (it seems task 1 told task 2 it is ok to leave the fence)
 
 a simple work around is to call MPI_Barrier before calling
 MPI_Comm_disconnect
 
 at this stage, i doubt it is even possible to get this working at the
 pmix level, so the fix
 might be to have MPI_Comm_disconnect invoke MPI_Barrier
 the attached comm_disconnect.patch always call the barrier before
 (indirectly) invoking pmix
 
 could you please comment on this issue ?
 
 Cheers,
 
 Gilles
 
 here are the relevant logs :
 
 [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs
 [[8110,1],0] and [[8110,3],0]
 [soleil:00650] [[8110,3],0]
 [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
 post
 send to server
 [soleil:00650] [[8110,3],0] posting recv on tag 5
 [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server -
 queueing for send
 [soleil:00650] [[8110,3],0] usock:send_handler called to send to server
 [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER
 [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs
 [[8110,1],0] and [[8110,2],0]
 [soleil:00647] [[8110,2],0]
 [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
 post
 send to server
 [soleil:00647] [[8110,2],0] posting recv on tag 5
 [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server -
 queueing for send
 [soleil:00647] [[8110,2],0] usock:send_handler called to send to server
 [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER
 [soleil:00650] [[8110,3],0] usock:recv:handler called
 [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED
 [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg
 [soleil:00650] usock:recv:handler read hdr
 [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of
 size 14
 [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14
 BYTES FOR TAG 5
 [soleil:00650] [[8110,3],0]
 [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415]
 post msg
 [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5

Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Ralph Castain
Yeah, that was going to be my first place to look once I finished breakfast :-)

Thanks!
Ralph

On Aug 25, 2014, at 7:32 AM, Gilles Gouaillardet 
 wrote:

> Thanks for the explanation
> 
> In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by 
> sizeof(opal_identifier_t).
> 
> Being afk, I could not test but that looks like a good suspect
> 
> Cheers,
> 
> Gilles
> 
> Ralph Castain  wrote:
>> Each collective is given a "signature" that is just the array of names for 
>> all procs involved in the collective. Thus, even though task 0 is involved 
>> in both of the disconnect barriers, the two collectives should be running in 
>> isolation from each other.
>> 
>> The "tags" are just receive callbacks and have no meaning other than to 
>> associate a particular callback to a given send/recv pair. It is the 
>> signature that counts as the daemons are using that to keep the various 
>> collectives separated.
>> 
>> I'll have to take a look at why task 2 is leaving early. The key will be to 
>> look at that signature to ensure we aren't getting it confused.
>> 
>> On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet 
>>  wrote:
>> 
>>> Folks,
>>> 
>>> when i run
>>> mpirun -np 1 ./intercomm_create
>>> from the ibm test suite, it either :
>>> - success
>>> - hangs
>>> - mpirun crashes (SIGSEGV) soon after writing the following message
>>> ORTE_ERROR_LOG: Not found in file
>>> ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566
>>> 
>>> here is what happens :
>>> 
>>> first, the test program itself :
>>> task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and
>>> parent on task 1
>>> then
>>> task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and
>>> parent on task 2
>>> then
>>> several operations (merge, barrier, ...)
>>> and then without any synchronization :
>>> - task 0 MPI_Comm_disconnect(ab_inter) and then
>>> MPI_Comm_disconnect(ac_inter)
>>> - task 1 and task 2 MPI_Comm_disconnect(parent)
>>> 
>>> i applied the attached pmix_debug.patch and ran
>>> mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create
>>> 
>>> basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0
>>> and 2 execute a native fence.
>>> they both use the *same* tags on different though overlapping tasks
>>> bottom line, task 2 leave the fences *before* task 0 enterred the fence
>>> (it seems task 1 told task 2 it is ok to leave the fence)
>>> 
>>> a simple work around is to call MPI_Barrier before calling
>>> MPI_Comm_disconnect
>>> 
>>> at this stage, i doubt it is even possible to get this working at the
>>> pmix level, so the fix
>>> might be to have MPI_Comm_disconnect invoke MPI_Barrier
>>> the attached comm_disconnect.patch always call the barrier before
>>> (indirectly) invoking pmix
>>> 
>>> could you please comment on this issue ?
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> here are the relevant logs :
>>> 
>>> [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs
>>> [[8110,1],0] and [[8110,3],0]
>>> [soleil:00650] [[8110,3],0]
>>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>>> post
>>> send to server
>>> [soleil:00650] [[8110,3],0] posting recv on tag 5
>>> [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server -
>>> queueing for send
>>> [soleil:00650] [[8110,3],0] usock:send_handler called to send to server
>>> [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER
>>> [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs
>>> [[8110,1],0] and [[8110,2],0]
>>> [soleil:00647] [[8110,2],0]
>>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>>> post
>>> send to server
>>> [soleil:00647] [[8110,2],0] posting recv on tag 5
>>> [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server -
>>> queueing for send
>>> [soleil:00647] [[8110,2],0] usock:send_handler called to send to server
>>> [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER
>>> [soleil:00650] [[8110,3],0] usock:recv:handler called
>>> [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED
>>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg
>>> [soleil:00650] usock:recv:handler read hdr
>>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of
>>> size 14
>>> [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14
>>> BYTES FOR TAG 5
>>> [soleil:00650] [[8110,3],0]
>>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415]
>>> post msg
>>> [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5
>>> [soleil:00650] [[8110,3],0] checking msg on tag 5 for tag 5
>>> [soleil:00650] [[8110,3],0] pmix:native recv callback activated with 14
>>> bytes
>>> [soleil:00650] [[8110,3],0] pmix:native fence released on 2 procs
>>> [[8110,1],0] and [[8110,3],0]
>>> 
>>> 
>>> 

Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
Thanks for the explanation

In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by 
sizeof(opal_identifier_t).

Being afk, I could not test but that looks like a good suspect

Cheers,

Gilles

Ralph Castain  wrote:
>Each collective is given a "signature" that is just the array of names for all 
>procs involved in the collective. Thus, even though task 0 is involved in both 
>of the disconnect barriers, the two collectives should be running in isolation 
>from each other.
>
>The "tags" are just receive callbacks and have no meaning other than to 
>associate a particular callback to a given send/recv pair. It is the signature 
>that counts as the daemons are using that to keep the various collectives 
>separated.
>
>I'll have to take a look at why task 2 is leaving early. The key will be to 
>look at that signature to ensure we aren't getting it confused.
>
>On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet 
> wrote:
>
>> Folks,
>> 
>> when i run
>> mpirun -np 1 ./intercomm_create
>> from the ibm test suite, it either :
>> - success
>> - hangs
>> - mpirun crashes (SIGSEGV) soon after writing the following message
>> ORTE_ERROR_LOG: Not found in file
>> ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566
>> 
>> here is what happens :
>> 
>> first, the test program itself :
>> task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and
>> parent on task 1
>> then
>> task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and
>> parent on task 2
>> then
>> several operations (merge, barrier, ...)
>> and then without any synchronization :
>> - task 0 MPI_Comm_disconnect(ab_inter) and then
>> MPI_Comm_disconnect(ac_inter)
>> - task 1 and task 2 MPI_Comm_disconnect(parent)
>> 
>> i applied the attached pmix_debug.patch and ran
>> mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create
>> 
>> basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0
>> and 2 execute a native fence.
>> they both use the *same* tags on different though overlapping tasks
>> bottom line, task 2 leave the fences *before* task 0 enterred the fence
>> (it seems task 1 told task 2 it is ok to leave the fence)
>> 
>> a simple work around is to call MPI_Barrier before calling
>> MPI_Comm_disconnect
>> 
>> at this stage, i doubt it is even possible to get this working at the
>> pmix level, so the fix
>> might be to have MPI_Comm_disconnect invoke MPI_Barrier
>> the attached comm_disconnect.patch always call the barrier before
>> (indirectly) invoking pmix
>> 
>> could you please comment on this issue ?
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> here are the relevant logs :
>> 
>> [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs
>> [[8110,1],0] and [[8110,3],0]
>> [soleil:00650] [[8110,3],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>> post
>> send to server
>> [soleil:00650] [[8110,3],0] posting recv on tag 5
>> [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server -
>> queueing for send
>> [soleil:00650] [[8110,3],0] usock:send_handler called to send to server
>> [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER
>> [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs
>> [[8110,1],0] and [[8110,2],0]
>> [soleil:00647] [[8110,2],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>> post
>> send to server
>> [soleil:00647] [[8110,2],0] posting recv on tag 5
>> [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server -
>> queueing for send
>> [soleil:00647] [[8110,2],0] usock:send_handler called to send to server
>> [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER
>> [soleil:00650] [[8110,3],0] usock:recv:handler called
>> [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED
>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg
>> [soleil:00650] usock:recv:handler read hdr
>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of
>> size 14
>> [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14
>> BYTES FOR TAG 5
>> [soleil:00650] [[8110,3],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415]
>> post msg
>> [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5
>> [soleil:00650] [[8110,3],0] checking msg on tag 5 for tag 5
>> [soleil:00650] [[8110,3],0] pmix:native recv callback activated with 14
>> bytes
>> [soleil:00650] [[8110,3],0] pmix:native fence released on 2 procs
>> [[8110,1],0] and [[8110,3],0]
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15701.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: