i finally got it :-) /* i previously got it "almost" right ... */
here is what happens on job 2 (with trunk) : MPI_Intercomm_create calls ompi_comm_get_rprocs that calls ompi_proc_unpack => ompi_proc_unpack store job 3 info into opal_dstore_peer then ompi_comm_get_rprocs calls ompi_proc_set_locality(job 3) => ompi_proc_set_locality fetch information job 3 info from opal_dstore_internal (not found) and then opal_dstore_nonpeer (not found again) and then fails. this is just the consequence of ompi_proc_unpack stored job 3 info in opal_dstore_peer and not in opal_dstore_nonpeer i do not understand which of opal_dstore_peer and opal_dstore_nonpeer should be used and when, so i wrote a defensive patch (fetch both opal_dstore_nonpeer and then opal_dstore_peer if not previously found). please someone review this and comment/fix it if needed (for example, store in opal_dstore_nonpeer instead of opal_dstore_peer *or* fetch in opal_dstore_peer instead of opal_dstore_nonpeer and/or something else ) and then, locality is correctly set, coll ml receives correct information and this does not hang any more if mpirun is invoked without --mca coll ^ml and on a single node single socket VM) bottom line, job 2 *did* receive information of job 3 but failed to store/fetch it in the right opal_store ! v1.8 is unaffected since there is only one dstore Cheers, Gilles On Wed, May 28, 2014 at 4:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > Hmmm...I did some digging, and the best I can tell is that root cause is > that the second job ("b" in the test program) is never actually calling > connect_accept! This looks like a change may have occurred in > Intercomm_create that is causing it to not recognize the need to do so. > > Anyone confirm that diagnosis? > > FWIW: job 1 clearly receives and has all the required info in the correct > places - it is ready to provide it to job 2, if/when job 2 actually calls > connect_accept. > > On May 27, 2014, at 10:13 AM, Ralph Castain <r...@open-mpi.org> wrote: > > > Hi Gilles > > > > I concur on the typo and fixed it - thanks for catching it. I'll have to > look into the problem you reported as it has been fixed in the past, and > was working last I checked it. The info required for this 3-way > connect/accept is supposed to be in the modex provided by the common > communicator. > > > > On May 27, 2014, at 3:51 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > > >> Folks, > >> > >> while debugging the dynamic/intercomm_create from the ibm test suite, i > found something odd. > >> > >> i ran *without* any batch manager on a VM (one socket and four cpus) > >> mpirun -np 1 ./dynamic/intercomm_create > >> > >> it hangs by default > >> it works with --mca coll ^ml > >> > >> basically : > >> - task 0 spawns task 1 > >> - task 0 spawns task 2 > >> - a communicator is created for the 3 tasks via MPI_Intercomm_create() > >> > >> MPI_Intercomm_create() calls ompi_comm_get_rprocs() which calls > ompi_proc_set_locality() > >> > >> then, on task 1, ompi_proc_set_locality() calls > >> opal_dstore.fetch(opal_dstore_internal, "task 2"->proc_name, ...) which > fails and this is OK > >> then > >> opal_dstore_fetch(opal_dstore_nonpeer, "task 2"->proc_name, ...) which > fails and this is *not* OK > >> > >> /* on task 2, the first fetch for "task 1" fails but the second success > */ > >> > >> my analysis is that when task 2 was created, it updated its > opal_dstore_nonpeer with info from "task 1" which was previously spawned by > task 0. > >> when task 1 was spawned, task 2 did not exist yet and hence > opal_dstore_nonpeer contains no reference to task 2. > >> but when task 2 was spawned, opal_dstore_nonpeer of task 1 has not been > updated, hence the failure > >> > >> (on task 1, proc_flags of task 2 has incorrect locality, this likely > confuses coll ml and hang the test) > >> > >> should task1 have received new information when task 2 was spawned ? > >> shoud task2 have sent information to task1 when it was spawned ? > >> should task1 have (tried to) get fresh information before invoking > MPI_Intercomm_create() ? > >> > >> incidentally, i found ompi_proc_set_locality calls opal_dstore.store > with > >> identifier &proc (the argument is &proc->proc_name everywhere else, so > this > >> is likely a bug/typo. the attached patch fixes this. > >> > >> Thanks in advance for your feedback, > >> > >> Gilles > >> <proc.patch>_______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14848.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14861.php >
Index: ompi/proc/proc.c =================================================================== --- ompi/proc/proc.c (revision 31899) +++ ompi/proc/proc.c (working copy) @@ -13,6 +13,8 @@ * Copyright (c) 2012 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -155,8 +157,12 @@ if (OMPI_SUCCESS != (ret = opal_dstore.fetch(opal_dstore_nonpeer, (opal_identifier_t*)&proc->proc_name, OMPI_RTE_NODE_ID, &myvals))) { - OPAL_LIST_DESTRUCT(&myvals); - return ret; + if (OMPI_SUCCESS != (ret = opal_dstore.fetch(opal_dstore_peer, + (opal_identifier_t*)&proc->proc_name, + OMPI_RTE_NODE_ID, &myvals))) { + OPAL_LIST_DESTRUCT(&myvals); + return ret; + } } kv = (opal_value_t*)opal_list_get_first(&myvals); vpid = kv->data.uint32; @@ -198,9 +204,13 @@ (opal_identifier_t*)&proc->proc_name, OPAL_DSTORE_CPUSET, &myvals))) { /* check the nonpeer data in case of comm_spawn */ - ret = opal_dstore.fetch(opal_dstore_nonpeer, - (opal_identifier_t*)&proc->proc_name, - OPAL_DSTORE_CPUSET, &myvals); + if (OMPI_SUCCESS != ( ret = opal_dstore.fetch(opal_dstore_nonpeer, + (opal_identifier_t*)&proc->proc_name, + OPAL_DSTORE_CPUSET, &myvals))) { + ret = opal_dstore.fetch(opal_dstore_peer, + (opal_identifier_t*)&proc->proc_name, + OPAL_DSTORE_CPUSET, &myvals); + } } if (OMPI_SUCCESS != ret) { /* we don't know their cpuset, so nothing more we can say */