HI Folks, I think this is a bug in the PSM MTL add_procs. The call to psm_ep_connect needs to be taking previously connected ep's into account, much like what is done in the libfabric psm provider code.
Howard 2014-11-12 3:12 GMT-07:00 Rainer Keller <rainer.kel...@hft-stuttgart.de>: > Dear Andrew, > no, this is not done with dynamically connecting jobs. > > The failing tests use a communicator, which is setup by merging back an > intercommunicator (MPI_Intercomm_merge), which was first split from > MPI_COMM_WORLD (MPI_Intercomm_create). > > Please see tst_comm.c:459 > > Best regards, > Rainer > > > > > On 11.11.2014, at 23:44, "Friedley, Andrew" <andrew.fried...@intel.com> > wrote: > > > Ralph, > > > > You're right that PSM wouldn't support dynamically connecting jobs. I > don't think intercomm_create implies that though. For example you could > split COMM_WORLD's group into two groups, then create an intercommunicator > across those two groups. I'm guessing that's what this test is doing, I'd > have to go read the code to be sure though. > > > > I verified this tests works over PSM and OMPI 1.6.5; it fails on 1.8.1 > and 1.8.3. > > > > Andrew > > > >> -----Original Message----- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph > >> Castain > >> Sent: Tuesday, November 11, 2014 2:23 PM > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] 1.8.3 and PSM errors > >> > >> I thought PSM didn’t support dynamic operations such as Intercomm_create > >> - yes? The PSM security key wouldn’t match between the two jobs, and so > >> there is no way for them to communicate. > >> > >> Which is why I thought PSM can’t be used for dynamic operations at all, > >> including comm_spawn and connect/accept > >> > >> > >>> On Nov 11, 2014, at 2:13 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> > >> wrote: > >>> > >>> On Nov 11, 2014, at 4:56 PM, Friedley, Andrew > >> <andrew.fried...@intel.com> wrote: > >>> > >>>> OK, I'm able to reproduce this now, not sure why I couldn't before. > I took > >> a look at the diff of the PSM MTL from 1.6.5 to 1.8.1, and nothing is > standing > >> out to me. > >>>> > >>>> Question more for the general group: Did anything related to the > >> behavior/usage of MTL add_procs() change in this time window? > >>> > >>> The time between the 1.6.x series and the 1.8.x series is measure in > terms > >> of a year or two, so, ya, something might have changed... > >>> > >>>> More particularly, it looks like add_procs is being called a second > time > >> during MPI_Intercomm_create and being passed a process that is already > >> connected (passed into the first add_procs call). Is that right? > Should the > >> MTL handle multiple add_procs calls with the same proc provided? > >>> > >>> I'm afraid I don't know much about the MTL interface. > >>> > >>> George / Nathan? > >>> > >>> -- > >>> Jeff Squyres > >>> jsquy...@cisco.com > >>> For corporate legal information go to: > >> http://www.cisco.com/web/about/doing_business/legal/cri/ > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> Link to this post: http://www.open- > >> mpi.org/community/lists/devel/2014/11/16294.php > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: http://www.open- > >> mpi.org/community/lists/devel/2014/11/16295.php > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/11/16296.php > > --------------------------------------------------------------------- > Prof. Dr.-Ing. Rainer Keller > Hochschule für Technik Stuttgart > Fakultät für Vermessung, Informatik und Mathematik > Schellingstr. 24, Raum 2/449 > 70174 Stuttgart > T.: +49 (0)711 8926-2812 > F.: +49 (0)711 8926-2553 > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/11/16299.php >