In last night's MTT, I got a bunch of errors in COMM_SPAWN. I know
we're expecting it to fail (possibly/probably due to IOF errors), but
this didn't appear to be what we expected. For simplicity, I
compiled the IBM test suite manually and ran the spawn test:
[0:30] svbu-mpi:~/svn/ompi-tests/ibm/dynamic % mpirun -np 3 spawn
[svbu-mpi001.cisco.com:02845] [1,1] ORTE_ERROR_LOG: Communication
failure in file grpcomm_basic_module.c at line 666
[svbu-mpi001.cisco.com:02845] [1,1] ORTE_ERROR_LOG: Communication
failure in file communicator/comm_dyn.c at line 274
[**ERROR**]: MPI_COMM_WORLD rank 1, file spawn.c:114:
ERROR: MPI_Comm_spawn returned errcode[0] = -112
[svbu-mpi001.cisco.com:02845] MPI_ABORT invoked on rank 1 in
communicator MPI_COMM_WORLD with errorcode 1
[svbu-mpi001.cisco.com:02846] [1,2] ORTE_ERROR_LOG: Communication
failure in file grpcomm_basic_module.c at line 666
[svbu-mpi001.cisco.com:02846] [1,2] ORTE_ERROR_LOG: Communication
failure in file communicator/comm_dyn.c at line 274
[**ERROR**]: MPI_COMM_WORLD rank 2, file spawn.c:114:
ERROR: MPI_Comm_spawn returned errcode[0] = -112
[svbu-mpi001.cisco.com:02846] MPI_ABORT invoked on rank 2 in
communicator MPI_COMM_WORLD with errorcode 1
This looks odd to me ("communication failure"). Ralph -- can you
investigate?
Thanks!
--
Jeff Squyres
Cisco Systems