[OMPI devel] Deadlock when creating too many communicators

Wolfgang Bangerth Sat, 5 Sep 2009 16:52:58 -0400

Howdy,
here's a creative way to deadlock a program: create and destroy 65500 and 
some communicators and send a message on each of them:
----------------------------------------
#include <mpi.h>


#define CHECK(a)                                \
  {                                             \
    int err = (a);                              \
    if (err != 0) std::cout << "Error in line " << __LINE__ << std::endl; \
  }

int main (int argc, char *argv[])
{
  int a=0, b;

  MPI_Init (&argc, &argv);

  for (int i=0; i<1000000; ++i)
    {
      if (i % 100 == 0) std::cout<< "Duplication event " << i << std::endl;

      MPI_Comm dup;
      CHECK(MPI_Comm_dup (MPI_COMM_WORLD, &dup));
      CHECK(MPI_Allreduce(&a, &b, 1, MPI_INT, MPI_MIN, dup));
      CHECK(MPI_Comm_free (&dup));
    }

  MPI_Finalize();
}
-------------------------------------------
If you run this, for example, on two processors with OpenMPI 1.2.6 or 
1.3.2, you'll see that the program runs until after it produces 65500 as 
output, and then just hangs -- on my system somewhere in the operating 
system poll(), running full steam.

Since I take care of destroying the communicators again, I would have 
expected this to work. I use creating many communicators basically as a 
debugging tool: every object gets its own communicator to work on to 
ensure that different objects don't communicate by accident with each 
other just because they all use MPI_COMM_WORLD. It would be nice if this 
mode of using MPI could be made to work.

Best & thanks in advance!
 Wolfgang

-- 
-------------------------------------------------------------------------
Wolfgang Bangerth                email:            [email protected]
                                 www: http://www.math.tamu.edu/~bangerth/

[OMPI devel] Deadlock when creating too many communicators

Reply via email to