Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite
And that was indeed the problem - fixed, and now the trunk runs clean thru my MTT. Thanks again! Ralph On Aug 25, 2014, at 7:38 AM, Ralph Castainwrote: > Yeah, that was going to be my first place to look once I finished breakfast > :-) > > Thanks! > Ralph > > On Aug 25, 2014, at 7:32 AM, Gilles Gouaillardet > wrote: > >> Thanks for the explanation >> >> In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by >> sizeof(opal_identifier_t). >> >> Being afk, I could not test but that looks like a good suspect >> >> Cheers, >> >> Gilles >> >> Ralph Castain wrote: >>> Each collective is given a "signature" that is just the array of names for >>> all procs involved in the collective. Thus, even though task 0 is involved >>> in both of the disconnect barriers, the two collectives should be running >>> in isolation from each other. >>> >>> The "tags" are just receive callbacks and have no meaning other than to >>> associate a particular callback to a given send/recv pair. It is the >>> signature that counts as the daemons are using that to keep the various >>> collectives separated. >>> >>> I'll have to take a look at why task 2 is leaving early. The key will be to >>> look at that signature to ensure we aren't getting it confused. >>> >>> On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet >>> wrote: >>> Folks, when i run mpirun -np 1 ./intercomm_create from the ibm test suite, it either : - success - hangs - mpirun crashes (SIGSEGV) soon after writing the following message ORTE_ERROR_LOG: Not found in file ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566 here is what happens : first, the test program itself : task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and parent on task 1 then task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and parent on task 2 then several operations (merge, barrier, ...) and then without any synchronization : - task 0 MPI_Comm_disconnect(ab_inter) and then MPI_Comm_disconnect(ac_inter) - task 1 and task 2 MPI_Comm_disconnect(parent) i applied the attached pmix_debug.patch and ran mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0 and 2 execute a native fence. they both use the *same* tags on different though overlapping tasks bottom line, task 2 leave the fences *before* task 0 enterred the fence (it seems task 1 told task 2 it is ok to leave the fence) a simple work around is to call MPI_Barrier before calling MPI_Comm_disconnect at this stage, i doubt it is even possible to get this working at the pmix level, so the fix might be to have MPI_Comm_disconnect invoke MPI_Barrier the attached comm_disconnect.patch always call the barrier before (indirectly) invoking pmix could you please comment on this issue ? Cheers, Gilles here are the relevant logs : [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs [[8110,1],0] and [[8110,3],0] [soleil:00650] [[8110,3],0] [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] post send to server [soleil:00650] [[8110,3],0] posting recv on tag 5 [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server - queueing for send [soleil:00650] [[8110,3],0] usock:send_handler called to send to server [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs [[8110,1],0] and [[8110,2],0] [soleil:00647] [[8110,2],0] [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] post send to server [soleil:00647] [[8110,2],0] posting recv on tag 5 [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server - queueing for send [soleil:00647] [[8110,2],0] usock:send_handler called to send to server [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER [soleil:00650] [[8110,3],0] usock:recv:handler called [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg [soleil:00650] usock:recv:handler read hdr [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of size 14 [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14 BYTES FOR TAG 5 [soleil:00650] [[8110,3],0] [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415] post msg [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5
Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite
Yeah, that was going to be my first place to look once I finished breakfast :-) Thanks! Ralph On Aug 25, 2014, at 7:32 AM, Gilles Gouaillardetwrote: > Thanks for the explanation > > In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by > sizeof(opal_identifier_t). > > Being afk, I could not test but that looks like a good suspect > > Cheers, > > Gilles > > Ralph Castain wrote: >> Each collective is given a "signature" that is just the array of names for >> all procs involved in the collective. Thus, even though task 0 is involved >> in both of the disconnect barriers, the two collectives should be running in >> isolation from each other. >> >> The "tags" are just receive callbacks and have no meaning other than to >> associate a particular callback to a given send/recv pair. It is the >> signature that counts as the daemons are using that to keep the various >> collectives separated. >> >> I'll have to take a look at why task 2 is leaving early. The key will be to >> look at that signature to ensure we aren't getting it confused. >> >> On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet >> wrote: >> >>> Folks, >>> >>> when i run >>> mpirun -np 1 ./intercomm_create >>> from the ibm test suite, it either : >>> - success >>> - hangs >>> - mpirun crashes (SIGSEGV) soon after writing the following message >>> ORTE_ERROR_LOG: Not found in file >>> ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566 >>> >>> here is what happens : >>> >>> first, the test program itself : >>> task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and >>> parent on task 1 >>> then >>> task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and >>> parent on task 2 >>> then >>> several operations (merge, barrier, ...) >>> and then without any synchronization : >>> - task 0 MPI_Comm_disconnect(ab_inter) and then >>> MPI_Comm_disconnect(ac_inter) >>> - task 1 and task 2 MPI_Comm_disconnect(parent) >>> >>> i applied the attached pmix_debug.patch and ran >>> mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create >>> >>> basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0 >>> and 2 execute a native fence. >>> they both use the *same* tags on different though overlapping tasks >>> bottom line, task 2 leave the fences *before* task 0 enterred the fence >>> (it seems task 1 told task 2 it is ok to leave the fence) >>> >>> a simple work around is to call MPI_Barrier before calling >>> MPI_Comm_disconnect >>> >>> at this stage, i doubt it is even possible to get this working at the >>> pmix level, so the fix >>> might be to have MPI_Comm_disconnect invoke MPI_Barrier >>> the attached comm_disconnect.patch always call the barrier before >>> (indirectly) invoking pmix >>> >>> could you please comment on this issue ? >>> >>> Cheers, >>> >>> Gilles >>> >>> here are the relevant logs : >>> >>> [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs >>> [[8110,1],0] and [[8110,3],0] >>> [soleil:00650] [[8110,3],0] >>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] >>> post >>> send to server >>> [soleil:00650] [[8110,3],0] posting recv on tag 5 >>> [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server - >>> queueing for send >>> [soleil:00650] [[8110,3],0] usock:send_handler called to send to server >>> [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER >>> [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs >>> [[8110,1],0] and [[8110,2],0] >>> [soleil:00647] [[8110,2],0] >>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] >>> post >>> send to server >>> [soleil:00647] [[8110,2],0] posting recv on tag 5 >>> [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server - >>> queueing for send >>> [soleil:00647] [[8110,2],0] usock:send_handler called to send to server >>> [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER >>> [soleil:00650] [[8110,3],0] usock:recv:handler called >>> [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED >>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg >>> [soleil:00650] usock:recv:handler read hdr >>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of >>> size 14 >>> [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14 >>> BYTES FOR TAG 5 >>> [soleil:00650] [[8110,3],0] >>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415] >>> post msg >>> [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5 >>> [soleil:00650] [[8110,3],0] checking msg on tag 5 for tag 5 >>> [soleil:00650] [[8110,3],0] pmix:native recv callback activated with 14 >>> bytes >>> [soleil:00650] [[8110,3],0] pmix:native fence released on 2 procs >>> [[8110,1],0] and [[8110,3],0] >>> >>> >>>
Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite
Thanks for the explanation In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by sizeof(opal_identifier_t). Being afk, I could not test but that looks like a good suspect Cheers, Gilles Ralph Castainwrote: >Each collective is given a "signature" that is just the array of names for all >procs involved in the collective. Thus, even though task 0 is involved in both >of the disconnect barriers, the two collectives should be running in isolation >from each other. > >The "tags" are just receive callbacks and have no meaning other than to >associate a particular callback to a given send/recv pair. It is the signature >that counts as the daemons are using that to keep the various collectives >separated. > >I'll have to take a look at why task 2 is leaving early. The key will be to >look at that signature to ensure we aren't getting it confused. > >On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet > wrote: > >> Folks, >> >> when i run >> mpirun -np 1 ./intercomm_create >> from the ibm test suite, it either : >> - success >> - hangs >> - mpirun crashes (SIGSEGV) soon after writing the following message >> ORTE_ERROR_LOG: Not found in file >> ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566 >> >> here is what happens : >> >> first, the test program itself : >> task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and >> parent on task 1 >> then >> task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and >> parent on task 2 >> then >> several operations (merge, barrier, ...) >> and then without any synchronization : >> - task 0 MPI_Comm_disconnect(ab_inter) and then >> MPI_Comm_disconnect(ac_inter) >> - task 1 and task 2 MPI_Comm_disconnect(parent) >> >> i applied the attached pmix_debug.patch and ran >> mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create >> >> basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0 >> and 2 execute a native fence. >> they both use the *same* tags on different though overlapping tasks >> bottom line, task 2 leave the fences *before* task 0 enterred the fence >> (it seems task 1 told task 2 it is ok to leave the fence) >> >> a simple work around is to call MPI_Barrier before calling >> MPI_Comm_disconnect >> >> at this stage, i doubt it is even possible to get this working at the >> pmix level, so the fix >> might be to have MPI_Comm_disconnect invoke MPI_Barrier >> the attached comm_disconnect.patch always call the barrier before >> (indirectly) invoking pmix >> >> could you please comment on this issue ? >> >> Cheers, >> >> Gilles >> >> here are the relevant logs : >> >> [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs >> [[8110,1],0] and [[8110,3],0] >> [soleil:00650] [[8110,3],0] >> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] >> post >> send to server >> [soleil:00650] [[8110,3],0] posting recv on tag 5 >> [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server - >> queueing for send >> [soleil:00650] [[8110,3],0] usock:send_handler called to send to server >> [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER >> [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs >> [[8110,1],0] and [[8110,2],0] >> [soleil:00647] [[8110,2],0] >> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] >> post >> send to server >> [soleil:00647] [[8110,2],0] posting recv on tag 5 >> [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server - >> queueing for send >> [soleil:00647] [[8110,2],0] usock:send_handler called to send to server >> [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER >> [soleil:00650] [[8110,3],0] usock:recv:handler called >> [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED >> [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg >> [soleil:00650] usock:recv:handler read hdr >> [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of >> size 14 >> [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14 >> BYTES FOR TAG 5 >> [soleil:00650] [[8110,3],0] >> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415] >> post msg >> [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5 >> [soleil:00650] [[8110,3],0] checking msg on tag 5 for tag 5 >> [soleil:00650] [[8110,3],0] pmix:native recv callback activated with 14 >> bytes >> [soleil:00650] [[8110,3],0] pmix:native fence released on 2 procs >> [[8110,1],0] and [[8110,3],0] >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15701.php > >___ >devel mailing list >de...@open-mpi.org >Subscription: