I see the problem. Before my changes ompi_comm_dup signalled that the communicator was not an inter-communicator by setting remote_size to 0. The remote size is now from the remote group if one was supplied (which is the case with intra-communicators) so ompi_comm_dup needs to make sure NULL is passed for the remote_group when duplicating intra-communicators.
I opened a PR. Once jenkins finishes I will merge it onto master.
-Nathan
On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
>
> Thanks
> Edgar
>
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include <stdio.h>
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >> MPI_Comm comm1;
> >> int root=0;
> >> int rank2, size2, global_buf=1;
> >> int rank, size;
> >>
> >> MPI_Init ( &argc, &argv );
> >>
> >> MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> >> MPI_Comm_size ( MPI_COMM_WORLD, &size );
> >>
> >>/* Setting up a new communicator */
> >> MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> >>
> >> MPI_Comm_size ( comm1, &size2 );
> >> MPI_Comm_rank ( comm1, &rank2 );
> >>
> >>
> >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >> if ( rank == root ) {
> >> printf("Bcast on MPI_COMM_WORLD finished\n");
> >> }
> >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> >> if ( rank == root ) {
> >> printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >> }
> >>
> >> MPI_Comm_free ( &comm1 );
> >>
> >> MPI_Finalize ();
> >> return ( 0 );
> >>}
> >
> >>_______________________________________________
> >>devel mailing list
> >>[email protected]
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post:
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >_______________________________________________
> >devel mailing list
> >[email protected]
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post:
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
pgpmMsCoitOgp.pgp
Description: PGP signature
