On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote:
> > But this will lockup:
> > 
> > pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep
> > LD
> > 
> > The reason is that the hostname in this last command doesn't match the
> > hostname I get when I query my interfaces, so mpirun thinks it must be a
> > remote host - and so we stick in ssh until that times out. Which could be
> > quick on your machine, but takes awhile for me.
> > 
> This is not my case. mpirun resolves hostname and runs env but
> LD_LIBRARY_PATH is not there. If I use full name like this
> # /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1.voltaire.com env | grep 
> LD_LIBRARY_PATH
> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
> 
> everything is OK.
> 
More info. If I provide hostname to mpirun as returned by command
"hostname" the LD_LIBRARY_PATH is not set:
# /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname`  env | grep LD
OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests

if I provide any other name that resolves to the same IP then
LD_LIBRARY_PATH is set.
# /home/glebn/openmpi/bin/mpirun -np 1 -H localhost  env | grep LD
OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
LD_LIBRARY_PATH=/home/glebn/openmpi/lib

Here is debug output of "bad" run:
/home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca pls_rsh_debug 1 echo
[elfit1:14730] pls:rsh: launching job 1
[elfit1:14730] pls:rsh: no new daemons to launch

Here is good one:
/home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca pls_rsh_debug 1 echo
[elfit1:14752] pls:rsh: launching job 1
[elfit1:14752] pls:rsh: local csh: 0, local sh: 1
[elfit1:14752] pls:rsh: assuming same remote shell as local shell
[elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1
[elfit1:14752] pls:rsh: final template argv:
[elfit1:14752] pls:rsh:     /usr/bin/ssh <template> orted --name <template> 
--num_procs 1 --vpid_start 0 --nodename <template> --universe 
root@elfit1:default-universe-14752 --nsreplica 
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --gprreplica 
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca 
mca_base_param_file_path 
/home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd 
-mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
[elfit1:14752] pls:rsh: launching on node localhost
[elfit1:14752] pls:rsh: localhost is a LOCAL node
[elfit1:14752] pls:rsh: reset PATH: 
/home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/vltmpi/OPENIB/mpi/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
[elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/openmpi/lib
[elfit1:14752] pls:rsh: changing to directory /root
[elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/orted) [orted 
--name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --universe 
root@elfit1:default-universe-14752 --nsreplica 
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --gprreplica 
"0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca 
mca_base_param_file_path 
/home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd 
-mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd --set-sid]

--
                        Gleb.

Reply via email to