I'll take a look - offhand, I don't know of anything limiting you to <= 64 ppn

On Mar 4, 2009, at 1:49 PM, Eugene Loh wrote:

I have a problem starting large SMP jobs (e.g., 64 processes on a single SMP) that might be related to a recent trunk change. (Guessing.) Does the following ring any bells?

...
...
...
[burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 299 [burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_modex.c at line 416 [burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file grpcomm_bad_module.c at line 378 [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 299 [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_modex.c at line 416 [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file grpcomm_bad_module.c at line 378 [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 299 [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_modex.c at line 416 [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file grpcomm_bad_module.c at line 378
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

orte_grpcomm_modex failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[burl-t5440-0:6756] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[burl-t5440-0:6757] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
...
...
...
<trunk-problem.tar.gz>_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to