I see the problem. Before my changes ompi_comm_dup signalled that the
communicator was not an inter-communicator by setting remote_size to
0. The remote size is now from the remote group if one was supplied
(which is the case with intra-communicators) so ompi_comm_dup needs to
make sure NULL is passed for the remote_group when duplicating
intra-communicators.

I opened a PR. Once jenkins finishes I will merge it onto master.

-Nathan

On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
> 
> Thanks
> Edgar
> 
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include <stdio.h>
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >>   MPI_Comm comm1;
> >>   int root=0;
> >>   int rank2, size2, global_buf=1;
> >>   int rank, size;
> >>
> >>   MPI_Init ( &argc, &argv );
> >>
> >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> >>
> >>/* Setting up a new communicator */
> >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> >>
> >>   MPI_Comm_size ( comm1, &size2 );
> >>   MPI_Comm_rank ( comm1, &rank2 );
> >>
> >>
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >>   if ( rank == root ) {
> >>       printf("Bcast on MPI_COMM_WORLD finished\n");
> >>   }
> >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> >>   if ( rank == root ) {
> >>       printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >>   }
> >>
> >>   MPI_Comm_free ( &comm1 );
> >>
> >>   MPI_Finalize ();
> >>   return ( 0 );
> >>}
> >
> >>_______________________________________________
> >>devel mailing list
> >>de...@open-mpi.org
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post: 
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >_______________________________________________
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php

Attachment: pgpmMsCoitOgp.pgp
Description: PGP signature

Reply via email to