what version of OpenMPI did you use? Patch #21970 should have fixed this issue on the trunk...

Thanks
Edgar

Sylvain Jeaugey wrote:
Hi list,

We are currently experiencing deadlocks when using communicators other than MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then MPI_Barrier on the communicator - see end of e-mail).

We can reproduce the deadlock only with openib and with at least 8 cores (no success with sm) and after ~20 runs average. Using larger number of cores greatly increases the occurence of the deadlock. When the deadlock occurs, every even process is stuck in MPI_Finalize and every odd process is in MPI_Barrier.

So we tracked the bug in the changesets and found out that this patch seem to have introduced the bug :

user:        brbarret
date:        Tue Aug 25 15:13:31 2009 +0000
summary: Per discussion in ticket #2009, temporarily disable the block CID allocation
algorithms until they properly reuse CIDs.

Reverting to the non multi-thread cid allocator makes the deadlock disappear.

I tried to dig further and understand why this makes a difference, with no luck.

If anyone can figure out what's happening, that would be great ...

Thanks,
Sylvain

#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv) {
    int rank, numTasks;
    int range[3];
    MPI_Comm testComm, dupComm;
    MPI_Group orig_group, new_group;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numTasks);
    MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
    range[0] = 0; /* first rank */
    range[1] = numTasks - 1; /* last rank */
    range[2] = 1; /* stride */
    MPI_Group_range_incl(orig_group, 1, &range, &new_group);
    MPI_Comm_create(MPI_COMM_WORLD, new_group, &testComm);
    MPI_Barrier(testComm);
    MPI_Finalize();
    return 0;
}

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to