On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote:
On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote:
But this will lockup:
pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961
printenv | grep
LD
The reason is that the hostname in this last command doesn't
match the
hostname I get when I query my interfaces, so mpirun thinks it
must be a
remote host - and so we stick in ssh until that times out.
Which could be
quick on your machine, but takes awhile for me.
This is not my case. mpirun resolves hostname and runs env but
LD_LIBRARY_PATH is not there. If I use full name like this
# /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1.voltaire.com
env | grep
LD_LIBRARY_PATH
LD_LIBRARY_PATH=/home/glebn/openmpi/lib
everything is OK.
More info. If I provide hostname to mpirun as returned by command
"hostname" the LD_LIBRARY_PATH is not set:
# /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` env | grep LD
OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
if I provide any other name that resolves to the same IP then
LD_LIBRARY_PATH is set.
# /home/glebn/openmpi/bin/mpirun -np 1 -H localhost env | grep LD
OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
LD_LIBRARY_PATH=/home/glebn/openmpi/lib
Here is debug output of "bad" run:
/home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca
pls_rsh_debug 1 echo
[elfit1:14730] pls:rsh: launching job 1
[elfit1:14730] pls:rsh: no new daemons to launch
Here is good one:
/home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca
pls_rsh_debug 1 echo
[elfit1:14752] pls:rsh: launching job 1
[elfit1:14752] pls:rsh: local csh: 0, local sh: 1
[elfit1:14752] pls:rsh: assuming same remote shell as local shell
[elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1
[elfit1:14752] pls:rsh: final template argv:
[elfit1:14752] pls:rsh: /usr/bin/ssh <template> orted --name
<template>
--num_procs 1 --vpid_start 0 --nodename <template> --universe
root@elfit1:default-universe-14752 --nsreplica
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
gprreplica
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
mca_base_param_file_path
/home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
glebn/openmpi
wd
-mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
[elfit1:14752] pls:rsh: launching on node localhost
[elfit1:14752] pls:rsh: localhost is a LOCAL node
[elfit1:14752] pls:rsh: reset PATH:
/home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/
vltmpi/OPENIB/mpi
/b
in:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/
local/bin:/sbin
:/
bin:/usr/sbin:/usr/bin:/root/bin
[elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/
openmpi/lib
[elfit1:14752] pls:rsh: changing to directory /root
[elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/
orted) [orted
--name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --
universe
root@elfit1:default-universe-14752 --nsreplica
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
gprreplica
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
mca_base_param_file_path
/home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
glebn/openmpi
wd
-mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
--set-sid]
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel