Folks, on my single socket four cores VM (no batch manager), i am running the intercomm_create test from the ibm test suite.
mpirun -np 1 ./intercomm_create => OK mpirun -np 2 ./intercomm_create => HANG :-( mpirun -np 2 --mca coll ^ml ./intercomm_create => OK basically, this first two tasks will call twice MPI_Comm_spawn(2 tasks) followed by MPI_Intercomm_merge and the 4 spawned tasks will call MPI_Intercomm_merge followed by MPI_Intercomm_create i digged a bit into that issue and found two distinct issues : 1) binding : tasks [0-1] (launched with mpirun) are bound on cores [0-1] => OK tasks[2-3] (first spawn) are bound on cores [0-1] => ODD, i would have expected [2-3] tasks[4-5] (second spawn) are not bound at all => ODD again, could have made sense if tasks[2-3] were bound on cores [2-3] i observe the same behaviour with the --oversubscribe mpirun parameter 2) coll/ml coll/ml hangs when -np 2 (total 6 tasks, including 2 unbound tasks) i suspect coll/ml is unable to handle unbound tasks. if i am correct, should coll/ml detect this and simply automatically disqualify itself ? Cheers, Gilles