The output below resulted from an attempt to start a job w/checkpoint support when the BLCR kernel modules were not loaded on then node ("pilot error"). The mistake is mine, but I am reporting this because there appears to be more going on in the output than probable should be - the following 2 lines in particular struck me as almost humorous, but clearly incorrect:

*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.

Below is the command and full output.  This is OMPI-1.5.3 on Linux/x86.

-Paul

$ mpirun --prefix $HOME/obj-pcp-j/cr_mpirun-j-5+6/INSTALL -host pcp-j-6 --mca btl ^openib -am ft-enable-cr -np 1 ./ring
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_cr_init() failed failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
[pcp-j-6:29247] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file ../../../openmpi-1.5.3/orte/runtime/orte_init.c at line 79
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[pcp-j-6:29247] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------


--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to