There is a known problem with Leopard and Open MPI of all versions. We
haven't had time to chase it down yet - probably still a few weeks away.

Ralph



On 2/11/08 1:39 PM, "Greg Watson" <g.wat...@computer.org> wrote:

> Hi,
> 
> Since I upgraded to MacOS X 10.5.1, I've been having problems running
> MPI programs (using both 1.2.4 and 1.2.5). The symptoms are
> intermittent (i.e. sometimes the application runs fine), and appear as
> follows:
> 
> 1. One or more of the application processes die (I've see both one and
> two processes die).
> 
> 2. (It appears) that the orted's associated with these application
> process then spin continually.
> 
> Here is what I see when I run "mpirun -np 4 ./mpitest":
> 
> 12467   ??  Rs     1:26.52 orted --bootproxy 1 --name 0.0.1 --
> num_procs 5 --vpid_start 0 --nodename node0 --universe
> greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12468   ??  Rs     1:26.63 orted --bootproxy 1 --name 0.0.2 --
> num_procs 5 --vpid_start 0 --nodename node1 --universe
> greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12469   ??  Ss     0:00.04 orted --bootproxy 1 --name 0.0.3 --
> num_procs 5 --vpid_start 0 --nodename node2 --universe
> greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12470   ??  Ss     0:00.04 orted --bootproxy 1 --name 0.0.4 --
> num_procs 5 --vpid_start 0 --nodename node3 --universe
> greg@Jarrah.local:default-universe-12462 --nsreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --gprreplica "0.0.0;tcp://
> 10.0.1.200:56749;tcp://9.67.176.162:56749;tcp://
> 10.37.129.2:56749;tcp://10.211.55.2:56749" --set-sid
> 12471   ??  S      0:00.05 ./mpitest
> 12472   ??  S      0:00.05 ./mpitest
> 
> Killing the mpirun results in:
> 
> $ mpirun -np 4 ./mpitest
> ^Cmpirun: killing job...
> 
> ^ 
> C 
> --------------------------------------------------------------------------
> WARNING: mpirun is in the process of killing a job, but has detected an
> interruption (probably control-C).
> 
> It is dangerous to interrupt mpirun while it is killing a job (proper
> termination may not be guaranteed).  Hit control-C again within 1
> second if you really want to kill mpirun immediately.
> --------------------------------------------------------------------------
> ^Cmpirun: forcibly killing job...
> --------------------------------------------------------------------------
> WARNING: mpirun has exited before it received notification that all
> started processes had terminated.  You should double check and ensure
> that there are no runaway processes still executing.
> --------------------------------------------------------------------------
> 
> At this point, the two spinning orted's are left running, and the only
> way to kill them is with -9.
> 
> Is anyone else seeing this problem?
> 
> Greg
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to