Hi,
I have set up an Xgrid including one laptop and 7 Mac mini nodes (all are
duo core machines). I have also installed openMPI (openmpi 1.2.1) on all
nodes. The laptop node (hostname: sib) has three roles: agent, controller
and client, all the other nodes are only agents.

When I started "mpirun -n 8 /bin/hostname" on my laptop node terminal, it
shows all 8 nodes' hostnames correctly. It seems that xgrid works fine.

Then I wanted to run a simple mpi code. The source code "Hello.c" has been
compiled (use mpicc) and the excuatalbe "Hello" has been copied to each node
under same path(I have also tested they all run properly on each of the
local nodes.). when I asked for 1 or 2 processors to run the job, xgrid
worked fine, but when I asked for 3 or more processors, all jobs were
failed. Following are the commands and the results/messages that I got.

Can anybody help me out?

*************************************
running "hostname" and the results, they looks good.
*************************************
sib:sharcnet$ mpirun -n 8 /bin/hostname
node2
node8
node4
node5
node3
node7
sib
node6

*************************************
the simple mpi program Hello.c source code
*************************************
#include
#include

int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);

printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

MPI_Finalize();
}

*************************************
ask for 1 and 2 processors to run "Hello"
and the results are all good
*************************************
sib:sharcnet$ mpirun -n 1 ~/openMPI_sutuff/Hello
Process 0 on sib out of 1

sib:sharcnet$ mpiurun -n 2 ~/openMPI_stuff/Hello
Process 1 on node2 out of 2
Process 0 on sib out of 2

*************************************
Here is the problem when
ask for 3 processors to run the job,
following are all the messages I got
*************************************

sib:sharcnet$ mpirun -n 3 ~/openMPI_stuff/Hello

Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.

Process 0.1.2 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 817 on node xgrid-node-0 exited on
signal 15 (Terminated).

sib:sharcnet$

Reply via email to