Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-04 Thread Dino Rossegger
I'll try to reinstall openmpi on a nfsdevice, maybe it works then.

Thanks for your help
dino

Tim Prins schrieb:
> Unfortunately, I am out of ideas on this one. It is very strange. Maybe 
> someone else has an idea.
> 
> I would recommend trying to install Open MPI again. First be sure to get 
> rid of all of the old installs of Open MPI from all your nodes, then 
> reinstall and try again.
> 
> Tim
> 
> Dino Rossegger wrote:
>> Here the Syntax & Output of the Command:
>> root@sun:~# mpirun --hostfile hostfile saturn
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:28777] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:28777] ERROR: There may be more information available from
>> [sun:28777] ERROR: the remote shell (see above).
>> [sun:28777] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
>>
>> I'm using version 1.2.3, got it from openmpi.org. I'm using the same
>> version of openmpi on all nodes.
>>
>> Thanks
>> dino
>>
>> Tim Prins schrieb:
>>> This is very odd. The daemon is being launched properly, but then things 
>>> get strange. It looks like mpirun is sending a message to kill 
>>> application processes on saturn.
>>>
>>> What version of Open MPI are you using?
>>>
>>> Are you sure that the same version of Open MPI us being used everywhere?
>>>
>>> Can you try:
>>> mpirun --hostfile hostfile hostname
>>>
>>> Thanks,
>>>
>>> Tim
>>>
>>> Dino Rossegger wrote:
 Hi again,

 Tim Prins schrieb:
> Hi,
>
> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
>> Hi again,
>>
>> Yes the error output is the same:
>> root@sun:~# mpirun --hostfile hostfile main
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 
>> 90
>> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:23748] ERROR: There may be more information available from
>> [sun:23748] ERROR: the remote shell (see above).
>> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
> Can you try:
> mpirun --debug-daemons --hostfile hostfile main
>
 Did it but it doesn't give me any special output (as far as I can see that)
 Heres the output:
 root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
 Daemon [0,0,1] checking in as pid 27168 on host sun
 [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
 ,0]
 [sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
 [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
 ase_orted_cmds.c at line 275
 [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
 dule.c at line 1164
 [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
 .c at line 90
 [sun:27167] ERROR: A daemon on node saturn failed to start as
 expected.
 [sun:27167] ERROR: There may be more information available fro
 m
 [sun:27167] ERROR: the remote shell (see above).
 [sun:27167] ERROR: The daemon exited unexpectedly with status
 255.
 [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
 ,0]
 [sun:27168] [0,0,1] orted_recv_pls: received exit


 [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
 ase_orted_cmds.c at line 188
 [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
 dule.c at line 1196
 --
 
 mpirun was unable to cleanly terminate the daemons for this jo
 b. Returned value Timeout instead of ORTE_SUCCESS.

 

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-03 Thread Tim Prins
Unfortunately, I am out of ideas on this one. It is very strange. Maybe 
someone else has an idea.


I would recommend trying to install Open MPI again. First be sure to get 
rid of all of the old installs of Open MPI from all your nodes, then 
reinstall and try again.


Tim

Dino Rossegger wrote:

Here the Syntax & Output of the Command:
root@sun:~# mpirun --hostfile hostfile saturn
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:28777] ERROR: A daemon on node saturn failed to start as expected.
[sun:28777] ERROR: There may be more information available from
[sun:28777] ERROR: the remote shell (see above).
[sun:28777] ERROR: The daemon exited unexpectedly with status 255.
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

I'm using version 1.2.3, got it from openmpi.org. I'm using the same
version of openmpi on all nodes.

Thanks
dino

Tim Prins schrieb:
This is very odd. The daemon is being launched properly, but then things 
get strange. It looks like mpirun is sending a message to kill 
application processes on saturn.


What version of Open MPI are you using?

Are you sure that the same version of Open MPI us being used everywhere?

Can you try:
mpirun --hostfile hostfile hostname

Thanks,

Tim

Dino Rossegger wrote:

Hi again,

Tim Prins schrieb:

Hi,

On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:

Hi again,

Yes the error output is the same:
root@sun:~# mpirun --hostfile hostfile main
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:23748] ERROR: A daemon on node saturn failed to start as expected.
[sun:23748] ERROR: There may be more information available from
[sun:23748] ERROR: the remote shell (see above).
[sun:23748] ERROR: The daemon exited unexpectedly with status 255.
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

Can you try:
mpirun --debug-daemons --hostfile hostfile main


Did it but it doesn't give me any special output (as far as I can see that)
Heres the output:
root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
Daemon [0,0,1] checking in as pid 27168 on host sun
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 275
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1164
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
.c at line 90
[sun:27167] ERROR: A daemon on node saturn failed to start as
expected.
[sun:27167] ERROR: There may be more information available fro
m
[sun:27167] ERROR: the remote shell (see above).
[sun:27167] ERROR: The daemon exited unexpectedly with status
255.
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received exit


[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 188
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1196
--

mpirun was unable to cleanly terminate the daemons for this jo
b. Returned value Timeout instead of ORTE_SUCCESS.

--



This may give more output about the error. Also, try
mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main

Heres the output, but I cant decipher it ^^
root@sun:~# mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfil
e hostfile main
[sun:27175] pls:rsh: local csh: 0, local sh: 1
[sun:27175] pls:rsh: assuming same remote shell as local shell
[sun:27175] pls:rsh: remote csh: 0, remote sh: 1
[sun:27175] pls:rsh: final template argv:
[sun:27175] pls:rsh: /usr/bin/ssh  orted --bootp
roxy 1 --name  --num_procs 3 --vpid_start 0 --nodena
me  --universe 

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-02 Thread Dino Rossegger
Here the Syntax & Output of the Command:
root@sun:~# mpirun --hostfile hostfile saturn
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:28777] ERROR: A daemon on node saturn failed to start as expected.
[sun:28777] ERROR: There may be more information available from
[sun:28777] ERROR: the remote shell (see above).
[sun:28777] ERROR: The daemon exited unexpectedly with status 255.
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

I'm using version 1.2.3, got it from openmpi.org. I'm using the same
version of openmpi on all nodes.

Thanks
dino

Tim Prins schrieb:
> This is very odd. The daemon is being launched properly, but then things 
> get strange. It looks like mpirun is sending a message to kill 
> application processes on saturn.
> 
> What version of Open MPI are you using?
> 
> Are you sure that the same version of Open MPI us being used everywhere?
> 
> Can you try:
> mpirun --hostfile hostfile hostname
> 
> Thanks,
> 
> Tim
> 
> Dino Rossegger wrote:
>> Hi again,
>>
>> Tim Prins schrieb:
>>> Hi,
>>>
>>> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
 Hi again,

 Yes the error output is the same:
 root@sun:~# mpirun --hostfile hostfile main
 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
 base/pls_base_orted_cmds.c at line 275
 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
 line 1164
 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
 [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
 [sun:23748] ERROR: There may be more information available from
 [sun:23748] ERROR: the remote shell (see above).
 [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
 base/pls_base_orted_cmds.c at line 188
 [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
 line 1196
 --
 mpirun was unable to cleanly terminate the daemons for this job.
 Returned value Timeout instead of ORTE_SUCCESS.

 --
>>> Can you try:
>>> mpirun --debug-daemons --hostfile hostfile main
>>>
>> Did it but it doesn't give me any special output (as far as I can see that)
>> Heres the output:
>> root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
>> Daemon [0,0,1] checking in as pid 27168 on host sun
>> [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
>> ,0]
>> [sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
>> ase_orted_cmds.c at line 275
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
>> dule.c at line 1164
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
>> .c at line 90
>> [sun:27167] ERROR: A daemon on node saturn failed to start as
>> expected.
>> [sun:27167] ERROR: There may be more information available fro
>> m
>> [sun:27167] ERROR: the remote shell (see above).
>> [sun:27167] ERROR: The daemon exited unexpectedly with status
>> 255.
>> [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
>> ,0]
>> [sun:27168] [0,0,1] orted_recv_pls: received exit
>>
>>
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
>> ase_orted_cmds.c at line 188
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
>> dule.c at line 1196
>> --
>> 
>> mpirun was unable to cleanly terminate the daemons for this jo
>> b. Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
>> 
>>
>>> This may give more output about the error. Also, try
>>> mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main
>> Heres the output, but I cant decipher it ^^
>> root@sun:~# mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfil
>> e hostfile main
>> [sun:27175] pls:rsh: local csh: 0, local sh: 1
>> [sun:27175] pls:rsh: assuming same remote shell as local shell
>> [sun:27175] pls:rsh: remote csh: 0, remote sh: 1
>> [sun:27175] pls:rsh: final template argv:
>> [sun:27175] pls:rsh: /usr/bin/ssh  orted --bootp
>> roxy 1 --name  --num_procs 3 --vpid_start 0 --nodena
>> me  

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-02 Thread Tim Prins
This is very odd. The daemon is being launched properly, but then things 
get strange. It looks like mpirun is sending a message to kill 
application processes on saturn.


What version of Open MPI are you using?

Are you sure that the same version of Open MPI us being used everywhere?

Can you try:
mpirun --hostfile hostfile hostname

Thanks,

Tim

Dino Rossegger wrote:

Hi again,

Tim Prins schrieb:

Hi,

On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:

Hi again,

Yes the error output is the same:
root@sun:~# mpirun --hostfile hostfile main
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:23748] ERROR: A daemon on node saturn failed to start as expected.
[sun:23748] ERROR: There may be more information available from
[sun:23748] ERROR: the remote shell (see above).
[sun:23748] ERROR: The daemon exited unexpectedly with status 255.
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

Can you try:
mpirun --debug-daemons --hostfile hostfile main


Did it but it doesn't give me any special output (as far as I can see that)
Heres the output:
root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
Daemon [0,0,1] checking in as pid 27168 on host sun
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 275
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1164
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
.c at line 90
[sun:27167] ERROR: A daemon on node saturn failed to start as
expected.
[sun:27167] ERROR: There may be more information available fro
m
[sun:27167] ERROR: the remote shell (see above).
[sun:27167] ERROR: The daemon exited unexpectedly with status
255.
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received exit


[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 188
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1196
--

mpirun was unable to cleanly terminate the daemons for this jo
b. Returned value Timeout instead of ORTE_SUCCESS.

--



This may give more output about the error. Also, try
mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main


Heres the output, but I cant decipher it ^^
root@sun:~# mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfil
e hostfile main
[sun:27175] pls:rsh: local csh: 0, local sh: 1
[sun:27175] pls:rsh: assuming same remote shell as local shell
[sun:27175] pls:rsh: remote csh: 0, remote sh: 1
[sun:27175] pls:rsh: final template argv:
[sun:27175] pls:rsh: /usr/bin/ssh  orted --bootp
roxy 1 --name  --num_procs 3 --vpid_start 0 --nodena
me  --universe root@sun:default-universe-27175 --nsr
eplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733
" --gprreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.
202:4733"
[sun:27175] pls:rsh: launching on node sun
[sun:27175] pls:rsh: sun is a LOCAL node
[sun:27175] pls:rsh: changing to directory /root
[sun:27175] pls:rsh: executing: (/usr/local/bin/orted) orted -
-bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --noden
ame sun --universe root@sun:default-universe-27175 --nsreplica
 "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733" --gp
rreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:47
33" --set-sid [SSH_AGENT_PID=24793 TERM=xterm SHELL=/bin/bash
SSH_CLIENT=10.2.56.124 21001 22 SSH_TTY=/dev/pts/0 USER=root L
D_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/b
in:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib SSH_AUTH_SOCK
=/tmp/ssh-sxbbH24792/agent.24792 MAIL=/var/mail/root PATH=/usr
/local/bin:/usr/bin:/bin:/usr/games:/opt/c3-4/:/usr/local/lib
PWD=/root LANG=en_US.UTF-8 SHLVL=1 HOME=/root LOGNAME=root SSH
_CONNECTION=10.2.56.124 21001 172.16.0.202 22 _=/usr/local/bin
/mpirun OMPI_MCA_rds_hostfile_path=hostfile orte-job-globals O
MPI_MCA_pls_rsh_debug=1 OMPI_MCA_seed=0]
[sun:27175] pls:rsh: launching on node saturn
[sun:27175] pls:rsh: saturn is a REMOTE node
[sun:27175] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh s

aturn orted --bootproxy 1 --name
0.0.2 

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-02 Thread Dino Rossegger
Hi again,

Tim Prins schrieb:
> Hi,
> 
> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
>> Hi again,
>>
>> Yes the error output is the same:
>> root@sun:~# mpirun --hostfile hostfile main
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:23748] ERROR: There may be more information available from
>> [sun:23748] ERROR: the remote shell (see above).
>> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
> Can you try:
> mpirun --debug-daemons --hostfile hostfile main
> 
Did it but it doesn't give me any special output (as far as I can see that)
Heres the output:
root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
Daemon [0,0,1] checking in as pid 27168 on host sun
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 275
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1164
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
.c at line 90
[sun:27167] ERROR: A daemon on node saturn failed to start as
expected.
[sun:27167] ERROR: There may be more information available fro
m
[sun:27167] ERROR: the remote shell (see above).
[sun:27167] ERROR: The daemon exited unexpectedly with status
255.
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received exit


[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 188
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1196
--

mpirun was unable to cleanly terminate the daemons for this jo
b. Returned value Timeout instead of ORTE_SUCCESS.

--


> This may give more output about the error. Also, try
> mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main

Heres the output, but I cant decipher it ^^
root@sun:~# mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfil
e hostfile main
[sun:27175] pls:rsh: local csh: 0, local sh: 1
[sun:27175] pls:rsh: assuming same remote shell as local shell
[sun:27175] pls:rsh: remote csh: 0, remote sh: 1
[sun:27175] pls:rsh: final template argv:
[sun:27175] pls:rsh: /usr/bin/ssh  orted --bootp
roxy 1 --name  --num_procs 3 --vpid_start 0 --nodena
me  --universe root@sun:default-universe-27175 --nsr
eplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733
" --gprreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.
202:4733"
[sun:27175] pls:rsh: launching on node sun
[sun:27175] pls:rsh: sun is a LOCAL node
[sun:27175] pls:rsh: changing to directory /root
[sun:27175] pls:rsh: executing: (/usr/local/bin/orted) orted -
-bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --noden
ame sun --universe root@sun:default-universe-27175 --nsreplica
 "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733" --gp
rreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:47
33" --set-sid [SSH_AGENT_PID=24793 TERM=xterm SHELL=/bin/bash
SSH_CLIENT=10.2.56.124 21001 22 SSH_TTY=/dev/pts/0 USER=root L
D_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/b
in:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib SSH_AUTH_SOCK
=/tmp/ssh-sxbbH24792/agent.24792 MAIL=/var/mail/root PATH=/usr
/local/bin:/usr/bin:/bin:/usr/games:/opt/c3-4/:/usr/local/lib
PWD=/root LANG=en_US.UTF-8 SHLVL=1 HOME=/root LOGNAME=root SSH
_CONNECTION=10.2.56.124 21001 172.16.0.202 22 _=/usr/local/bin
/mpirun OMPI_MCA_rds_hostfile_path=hostfile orte-job-globals O
MPI_MCA_pls_rsh_debug=1 OMPI_MCA_seed=0]
[sun:27175] pls:rsh: launching on node saturn
[sun:27175] pls:rsh: saturn is a REMOTE node
[sun:27175] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh s

aturn orted --bootproxy 1 --name
0.0.2 --num_procs 3 --vpid_st

   art 0 --nodename saturn --universe root@sun:default-universe-2

   7175 --nsreplica
"0.0.0;tcp://192.168.1.254:4733;tcp://172.16.

   0.202:4733" --gprreplica
"0.0.0;tcp://192.168.1.254:4733;tcp:/

   

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-01 Thread Tim Prins
Hi,

On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
> Hi again,
>
> Yes the error output is the same:
> root@sun:~# mpirun --hostfile hostfile main
> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 275
> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1164
> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
> [sun:23748] ERROR: There may be more information available from
> [sun:23748] ERROR: the remote shell (see above).
> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 188
> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1196
> --
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
>
> --
Can you try:
mpirun --debug-daemons --hostfile hostfile main

This may give more output about the error. Also, try
mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main

This will print out the exact command that is used to launch the orted.

Also, I would highly recommend not running Open MPI as root. It is just a bad 
idea.
>
> I wrote the following to my .ssh/environment (on all machines)
> LD_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bi
>n:/opt/c3-4/:/usr/lib:/usr/local/lib;
>
> PATH=$PATH:/usr/local/lib;
>
> export LD_LIBRARY_PATH;
> export PATH;
>
> and added the statement you told me to the ssd_config (on all machines):
> PermitUserEnvironment yes
>
> And it seems to me that the pathes are correct now.
>
> My shell is bash (/bin/bash)
>
> When running locate orted (to find out where exactly my openmpi
> installation is (compilation defaults) i saw that, on sun there was a
> /usr/bin/orted while there wasn't one on saturn.
> I deleted /usr/bin/orted on sun and tried again with the option --prefix
>  /usr/local/ (which seems to be  my installation directory) but it
> didn't work (same error).
Is it possible that you are mixing 2 different installations of Open MPI? You 
may consider installing OpenMPI to a NFS drive to make these things a bit 
easier.
>
> Is there a script or anything like that with which I can uninstall
> openmpi, because i'll might try a new compilation to /opt/openmpi since
> it doesn't look like I would be able to solve the problem.
If you still have the tree around that you used to 'make' Open MPI, you can 
just go into that tree and type 'make uninstall'.

Hope this helps,

Tim

>
> jody schrieb:
> > Now that the PATHs seem to be set correctly for
> > ssh i don't know what the problem could be.
> >
> > Is the error message still the same on as in the first mail?
> > Did you do the envorpnment/sshd_config on both machines?
> > What shell are you using?
> >
> > On other test you could make is to start your application
> > with the --prefix option:
> >
> > $mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main
> >
> > (assuming your Open MPI installation lies in /opt/openmpi
> > on both machines)
> >
> >
> > Jody
> >
> > On 10/1/07, Dino Rossegger  wrote:
> >> Hi Jodi,
> >> did the steps as you said, but it didn't work for me.
> >> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
> >> made the changes to sshd_config.
> >>
> >> But this all didn't solve my problem, although the pahts seemed to be
> >> set correctly (judging what ssh saturn `printenv >> test` says). I also
> >> restarted the ssh server, the error is the same.
> >>
> >> Hope you can help me out here and thanks for your help so far
> >> dino
> >>
> >> jody schrieb:
> >>> Dino -
> >>> I had a similar problem.
> >>> I was only able to solve it by setting PATH and LS_LIBRARY_PATH
> >>> in the file ~/ssh/environment on the client and setting
> >>>   PermitUserEnvironment yes
> >>> in /etc/ssh/sshd_config on the server (for this you need root
> >>> prioviledge though)
> >>>
> >>> To be on the safe side, i did both on all my nodes
> >>>
> >>> Jody
> >>>
> >>> On 9/27/07, Dino Rossegger  wrote:
>  Hi Jody,
> 
>  Thanks for your help, it really is the case that either in PATH nor in
>  LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
>  hope it works.
> 
>  jody schrieb:
> > Hi Dino
> >
> > Try
> >  ssh saturn printenv | grep PATH
> >
> > >from your host sun to see what your environment variables are when
> >
> > ssh is run without a shell.
> >
> > On 9/27/07, Dino Rossegger  wrote:
> >> Hi,
> >>
> >> I have a problem running a simple programm mpihello.cpp.
> >>
> >> Here is a 

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-01 Thread Dino Rossegger
Hi again,

Yes the error output is the same:
root@sun:~# mpirun --hostfile hostfile main
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:23748] ERROR: A daemon on node saturn failed to start as expected.
[sun:23748] ERROR: There may be more information available from
[sun:23748] ERROR: the remote shell (see above).
[sun:23748] ERROR: The daemon exited unexpectedly with status 255.
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

I wrote the following to my .ssh/environment (on all machines)
LD_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib;

PATH=$PATH:/usr/local/lib;

export LD_LIBRARY_PATH;
export PATH;

and added the statement you told me to the ssd_config (on all machines):
PermitUserEnvironment yes

And it seems to me that the pathes are correct now.

My shell is bash (/bin/bash)

When running locate orted (to find out where exactly my openmpi
installation is (compilation defaults) i saw that, on sun there was a
/usr/bin/orted while there wasn't one on saturn.
I deleted /usr/bin/orted on sun and tried again with the option --prefix
 /usr/local/ (which seems to be  my installation directory) but it
didn't work (same error).

Is there a script or anything like that with which I can uninstall
openmpi, because i'll might try a new compilation to /opt/openmpi since
it doesn't look like I would be able to solve the problem.




jody schrieb:
> Now that the PATHs seem to be set correctly for
> ssh i don't know what the problem could be.
> 
> Is the error message still the same on as in the first mail?
> Did you do the envorpnment/sshd_config on both machines?
> What shell are you using?
> 
> On other test you could make is to start your application
> with the --prefix option:
> 
> $mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main
> 
> (assuming your Open MPI installation lies in /opt/openmpi
> on both machines)
> 
> 
> Jody
> 
> On 10/1/07, Dino Rossegger  wrote:
>> Hi Jodi,
>> did the steps as you said, but it didn't work for me.
>> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
>> made the changes to sshd_config.
>>
>> But this all didn't solve my problem, although the pahts seemed to be
>> set correctly (judging what ssh saturn `printenv >> test` says). I also
>> restarted the ssh server, the error is the same.
>>
>> Hope you can help me out here and thanks for your help so far
>> dino
>>
>> jody schrieb:
>>> Dino -
>>> I had a similar problem.
>>> I was only able to solve it by setting PATH and LS_LIBRARY_PATH
>>> in the file ~/ssh/environment on the client and setting
>>>   PermitUserEnvironment yes
>>> in /etc/ssh/sshd_config on the server (for this you need root
>>> prioviledge though)
>>>
>>> To be on the safe side, i did both on all my nodes
>>>
>>> Jody
>>>
>>> On 9/27/07, Dino Rossegger  wrote:
 Hi Jody,

 Thanks for your help, it really is the case that either in PATH nor in
 LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
 hope it works.

 jody schrieb:
> Hi Dino
>
> Try
>  ssh saturn printenv | grep PATH
> >from your host sun to see what your environment variables are when
> ssh is run without a shell.
>
>
> On 9/27/07, Dino Rossegger  wrote:
>> Hi,
>>
>> I have a problem running a simple programm mpihello.cpp.
>>
>> Here is a excerp of the error and the command
>> root@sun:~# mpirun -H sun,saturn main
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 
>> 90
>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:25213] ERROR: There may be more information available from
>> [sun:25213] ERROR: the remote shell (see above).
>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-01 Thread jody
Now that the PATHs seem to be set correctly for
ssh i don't know what the problem could be.

Is the error message still the same on as in the first mail?
Did you do the envorpnment/sshd_config on both machines?
What shell are you using?

On other test you could make is to start your application
with the --prefix option:

$mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main

(assuming your Open MPI installation lies in /opt/openmpi
on both machines)


Jody

On 10/1/07, Dino Rossegger  wrote:
> Hi Jodi,
> did the steps as you said, but it didn't work for me.
> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
> made the changes to sshd_config.
>
> But this all didn't solve my problem, although the pahts seemed to be
> set correctly (judging what ssh saturn `printenv >> test` says). I also
> restarted the ssh server, the error is the same.
>
> Hope you can help me out here and thanks for your help so far
> dino
>
> jody schrieb:
> > Dino -
> > I had a similar problem.
> > I was only able to solve it by setting PATH and LS_LIBRARY_PATH
> > in the file ~/ssh/environment on the client and setting
> >   PermitUserEnvironment yes
> > in /etc/ssh/sshd_config on the server (for this you need root
> > prioviledge though)
> >
> > To be on the safe side, i did both on all my nodes
> >
> > Jody
> >
> > On 9/27/07, Dino Rossegger  wrote:
> >> Hi Jody,
> >>
> >> Thanks for your help, it really is the case that either in PATH nor in
> >> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
> >> hope it works.
> >>
> >> jody schrieb:
> >>> Hi Dino
> >>>
> >>> Try
> >>>  ssh saturn printenv | grep PATH
> >>> >from your host sun to see what your environment variables are when
> >>> ssh is run without a shell.
> >>>
> >>>
> >>> On 9/27/07, Dino Rossegger  wrote:
>  Hi,
> 
>  I have a problem running a simple programm mpihello.cpp.
> 
>  Here is a excerp of the error and the command
>  root@sun:~# mpirun -H sun,saturn main
>  [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>  base/pls_base_orted_cmds.c at line 275
>  [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>  line 1164
>  [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 
>  90
>  [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>  [sun:25213] ERROR: There may be more information available from
>  [sun:25213] ERROR: the remote shell (see above).
>  [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>  [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>  base/pls_base_orted_cmds.c at line 188
>  [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>  line 1196
>  --
>  mpirun was unable to cleanly terminate the daemons for this job.
>  Returned value Timeout instead of ORTE_SUCCESS.
> 
>  --
> 
>  The program is runable from each node alone (mpirun -np2 main)
> 
>  My PathVariables:
>  $PATH
>  /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
>  $LD_LIBRARY_PATH
>  /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
> 
>  Passwordless ssh is up 'n running
> 
>  I walked through the FAQ and Mailing Lists but couldn't find any
>  solution for my problem.
> 
>  Thanks
>  Dino R.
> 
> 
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread Tim Prins
Note that you may be able to get some more error output by 
adding --debug-daemons to the mpirun command line.

Tim

On Thursday 27 September 2007 05:12:53 pm Dino Rossegger wrote:
> Hi Jody,
>
> Thanks for your help, it really is the case that either in PATH nor in
> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
> hope it works.
>
> jody schrieb:
> > Hi Dino
> >
> > Try
> >  ssh saturn printenv | grep PATH
> >
> >>from your host sun to see what your environment variables are when
> >
> > ssh is run without a shell.
> >
> > On 9/27/07, Dino Rossegger  wrote:
> >> Hi,
> >>
> >> I have a problem running a simple programm mpihello.cpp.
> >>
> >> Here is a excerp of the error and the command
> >> root@sun:~# mpirun -H sun,saturn main
> >> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> >> base/pls_base_orted_cmds.c at line 275
> >> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> >> line 1164
> >> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
> >> 90 [sun:25213] ERROR: A daemon on node saturn failed to start as
> >> expected. [sun:25213] ERROR: There may be more information available
> >> from [sun:25213] ERROR: the remote shell (see above).
> >> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
> >> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> >> base/pls_base_orted_cmds.c at line 188
> >> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> >> line 1196
> >> 
> >>-- mpirun was unable to cleanly terminate the daemons for this job.
> >> Returned value Timeout instead of ORTE_SUCCESS.
> >>
> >> 
> >>--
> >>
> >> The program is runable from each node alone (mpirun -np2 main)
> >>
> >> My PathVariables:
> >> $PATH
> >> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:
> >>/usr/lib:/usr/local/libecho $LD_LIBRARY_PATH
> >> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:
> >>/usr/lib:/usr/local/lib
> >>
> >> Passwordless ssh is up 'n running
> >>
> >> I walked through the FAQ and Mailing Lists but couldn't find any
> >> solution for my problem.
> >>
> >> Thanks
> >> Dino R.
> >>
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread Dino Rossegger
Hi Jody,

Thanks for your help, it really is the case that either in PATH nor in
LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
hope it works.

jody schrieb:
> Hi Dino
> 
> Try
>  ssh saturn printenv | grep PATH
>>from your host sun to see what your environment variables are when
> ssh is run without a shell.
> 
> 
> On 9/27/07, Dino Rossegger  wrote:
>> Hi,
>>
>> I have a problem running a simple programm mpihello.cpp.
>>
>> Here is a excerp of the error and the command
>> root@sun:~# mpirun -H sun,saturn main
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:25213] ERROR: There may be more information available from
>> [sun:25213] ERROR: the remote shell (see above).
>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
>>
>> The program is runable from each node alone (mpirun -np2 main)
>>
>> My PathVariables:
>> $PATH
>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
>> $LD_LIBRARY_PATH
>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
>>
>> Passwordless ssh is up 'n running
>>
>> I walked through the FAQ and Mailing Lists but couldn't find any
>> solution for my problem.
>>
>> Thanks
>> Dino R.
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread jody
Hi Dino

Try
 ssh saturn printenv | grep PATH
from your host sun to see what your environment variables are when
ssh is run without a shell.


On 9/27/07, Dino Rossegger  wrote:
> Hi,
>
> I have a problem running a simple programm mpihello.cpp.
>
> Here is a excerp of the error and the command
> root@sun:~# mpirun -H sun,saturn main
> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 275
> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1164
> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
> [sun:25213] ERROR: There may be more information available from
> [sun:25213] ERROR: the remote shell (see above).
> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 188
> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1196
> --
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
>
> --
>
> The program is runable from each node alone (mpirun -np2 main)
>
> My PathVariables:
> $PATH
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
> $LD_LIBRARY_PATH
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
>
> Passwordless ssh is up 'n running
>
> I walked through the FAQ and Mailing Lists but couldn't find any
> solution for my problem.
>
> Thanks
> Dino R.
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread Dino Rossegger
Hi,

I have a problem running a simple programm mpihello.cpp.

Here is a excerp of the error and the command
root@sun:~# mpirun -H sun,saturn main
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:25213] ERROR: A daemon on node saturn failed to start as expected.
[sun:25213] ERROR: There may be more information available from
[sun:25213] ERROR: the remote shell (see above).
[sun:25213] ERROR: The daemon exited unexpectedly with status 255.
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

The program is runable from each node alone (mpirun -np2 main)

My PathVariables:
$PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
$LD_LIBRARY_PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib

Passwordless ssh is up 'n running

I walked through the FAQ and Mailing Lists but couldn't find any
solution for my problem.

Thanks
Dino R.