I see the problem. Before my changes ompi_comm_dup signalled that the communicator was not an inter-communicator by setting remote_size to 0. The remote size is now from the remote group if one was supplied (which is the case with intra-communicators) so ompi_comm_dup needs to make sure NULL is passed for the remote_group when duplicating intra-communicators.
I opened a PR. Once jenkins finishes I will merge it onto master. -Nathan On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote: > yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and > more processes. > > Thanks > Edgar > > On 9/16/2015 10:42 AM, Nathan Hjelm wrote: > > > >The reproducer is working for me with master on OX 10.10. Some changes > >to ompi_comm_set went in yesterday. Are you on the latest hash? > > > >-Nathan > > > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: > >>something is borked right now on master in the management of inter vs. intra > >>communicators. It looks like intra communicators are wrongly selecting the > >>inter coll module thinking that it is an inter communicator, and we have > >>hangs because of that. I attach a small replicator, where a bcast of a > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is > >>being selected. > >> > >>Thanks > >>Edgar > > > >>#include <stdio.h> > >>#include "mpi.h" > >> > >>int main( int argc, char *argv[] ) > >>{ > >> MPI_Comm comm1; > >> int root=0; > >> int rank2, size2, global_buf=1; > >> int rank, size; > >> > >> MPI_Init ( &argc, &argv ); > >> > >> MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); > >> MPI_Comm_size ( MPI_COMM_WORLD, &size ); > >> > >>/* Setting up a new communicator */ > >> MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 ); > >> > >> MPI_Comm_size ( comm1, &size2 ); > >> MPI_Comm_rank ( comm1, &rank2 ); > >> > >> > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD ); > >> if ( rank == root ) { > >> printf("Bcast on MPI_COMM_WORLD finished\n"); > >> } > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 ); > >> if ( rank == root ) { > >> printf("Bcast on duplicate of MPI_COMM_WORLD finished\n"); > >> } > >> > >> MPI_Comm_free ( &comm1 ); > >> > >> MPI_Finalize (); > >> return ( 0 ); > >>} > > > >>_______________________________________________ > >>devel mailing list > >>de...@open-mpi.org > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>Link to this post: > >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php > > > > > > > >_______________________________________________ > >devel mailing list > >de...@open-mpi.org > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >Link to this post: > >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
pgpmMsCoitOgp.pgp
Description: PGP signature