Re: [OMPI users] Problems Broadcasting/Scattering Data

2008-01-09 Thread Dino Rossegger

Andreas Schäfer schrieb:

On 19:31 Tue 08 Jan , Dino Rossegger wrote:

Hi,
thanks for the programm, but sadly I can't get it work :(.
It's the same error as in my programm. I get the following output:
0
0
0
10
0
0
11
0
0

Which as far as I know can't be correct. 


Oh, my bad. The field pointers had to be corrected for the Gather.
Now the output looks like this:



Now it works, thank you very much

0
500
1000
10
510
1010
11
511
1011

Is this what you did expect?

BTW: of course you can send multidimensional arrays with scatter and
gather. This is because they're really just one-dimensional in
memory. 



I thought so too but was a bit confused since I got the tip to fold the 
arrays etc.

Cheers
-Andreas






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problems Broadcasting/Scattering Data

2008-01-08 Thread Dino Rossegger

Hi,
thanks for the programm, but sadly I can't get it work :(.
It's the same error as in my programm. I get the following output:
0
0
0
10
0
0
11
0
0

Which as far as I know can't be correct. It's the same problem that I 
had in my programm. The process with the rank 0 has the right data in 
it's array, all the others only 0.

Andreas Schäfer schrieb:

Hi Dino,

On 18:05 Tue 08 Jan , Dino Rossegger wrote:
In fact it is initialized, as I stated in my first mail I only left out 
the code where it gets initialized, since it reads the data from a file 
and that works (I have tested it).


(you should have provided a self contained excerpt)

I have reworked your program to check what's running wrong. In fact it
seems to work as expected, although you might want to double check
your array sizes, they're named a bit confusingly.

Maybe it didn't work because there was no MPI_Finalize() at the
end. But then again I might not understand your problem. The other
machines do no output, so how can their output be wrong? If you expect
the other processors to have the same data in stat3 as rank 0 does,
then you should rather use MPI_Allgather than MPI_Gather, as
MPI_Gather will only concentrate the data on the root node, which is 0
in your case. Thus, the other nodes cannot produce the same output as
node 0.


No, I had a MPI_Finalize only cutted it away when I was pasting the 
programm. And I also understand that ther can't be the same output but 
only 0 is a little bit confusing for me.

I've attached my reworked version (including some initialization
code for clarity). If you want me again to debug a program of yours,
send a floppy along with a pizza Hawai (cartwheel size) to:
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena
Ernst-Abbe-Platz 2
07743 Jena
Germany

Seriously, that would be cool :P

HTH
-Andi



I would send you a pizza haway with a floppy but I fear that it will be 
cold until it gets to you ;).






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problems Broadcasting/Scattering Data

2008-01-08 Thread Dino Rossegger

George Bosilca schrieb:


On Jan 8, 2008, at 11:14 AM, Dino Rossegger wrote:

If so, then the problem is that Scatter actually gets an array of 
pointers

and sends these pointers trying to interpret them as doubles.
You either have to use several scatter commands or "fold" your
2D-Array into a one-dimensional array.

So neither MPI_Broadcast nor scatter can handle 2 dimensional arrays?
But also if it is like that, is it normal that there are only 0 in the
array? For me that sounds more as if the data isn't transmitted at all
and not that it isn't splitted correctly. But I'll try the folding,
maybe this will help.


The array that get scattered is not initialized, so it is normal that 
everyone get a lot of zeros ... Moreover, the only operation you do on 
the data (multiplication) will only generate zeros out of zeros. Try 
setting some meaningful data in the stat array before the MPI_Scatter 
operation.


  george.


In fact it is initialized, as I stated in my first mail I only left out 
the code where it gets initialized, since it reads the data from a file 
and that works (I have tested it).







Thanks


Hope this helps
 Jody

On Jan 8, 2008 3:54 PM, Dino Rossegger  wrote:

Hi,
I have a problem distributing a 2 dimensional array over 3 processes.

I tried  different methods to distribute the data (Broadcast, 
Send/Recv,

Scatter) but all of them didn't work for me. The output of the root
processor (0 in my case) is always okay, the output of the others are
simple 0.

The Array stat is filled with entrys from a file (I left out the
generation of the Array Data since this is much code and it works
(tested the whole thing in "single" mode.))

Here are the important parts of the Source Code:

const int ARRAYSIZE = 150;
int main(int argc, char* argv[])
{
   MPI_Init(&argc,&argv);
   int rank, anzprocs,recvcount,sendcnt;
   MPI_Comm_size(MPI_COMM_WORLD,&anzprocs);
   MPI_Comm_rank(MPI_COMM_WORLD,&rank);

   const int WORKING = ARRAYSIZE/anzprocs;

   double stat[ARRAYSIZE][2];
   double stathlp[WORKING][2];

   double stat2[WORKING][5];
   double stat3[anzprocs][ARRAYSIZE][5];
   if(rank==0)sendcnt=WORKING*2;
   
MPI::COMM_WORLD.Scatter(stat,sendcnt,MPI::DOUBLE,stathlp,WORKING*2,MPI::DOUBLE,0); 



   for(int i=0;i   MPI_Gather(&stat2, WORKING*5, MPI_DOUBLE,&stat3, recvcount, 
MPI_DOUBLE,

0, MPI_COMM_WORLD);
   if (rank==0){
   cout << stat3[0][0][0] << endl;
   cout << stat3[1][0][0] << endl;
   cout << stat3[2][0][0] << endl;
   }
}

I don't know any further since my experience with OMPI is also not too
big. Is there anything specific I have to know about distributing 2
Dimensional Arrays? I don't think that the error is in the MPI_Gather,
since I did a cout of the data on all nodes and the output was the 
same.


Thanks in advance and sorry for my bad english
Dino

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problems Broadcasting/Scattering Data

2008-01-08 Thread Dino Rossegger

Thanks

jody schrieb:

I'm not sure if i understad your code correctly -
i imagine you want to use the scatter command to
broadcast the contents of your 2-dimensional array stat,
is that right?
Yes I want to broadcast stat, and split it in the first dimension  , so 
that, for instance:

Process 1 gets stat[0][x] - stat[19][x]
Process 2 gets stat[20][x] -...



If so, then the problem is that Scatter actually gets an array of pointers
and sends these pointers trying to interpret them as doubles.
You either have to use several scatter commands or "fold" your
2D-Array into a one-dimensional array.
So neither MPI_Broadcast nor scatter can handle 2 dimensional arrays? 
But also if it is like that, is it normal that there are only 0 in the 
array? For me that sounds more as if the data isn't transmitted at all 
and not that it isn't splitted correctly. But I'll try the folding, 
maybe this will help.


Thanks


Hope this helps
  Jody

On Jan 8, 2008 3:54 PM, Dino Rossegger  wrote:

Hi,
I have a problem distributing a 2 dimensional array over 3 processes.

I tried  different methods to distribute the data (Broadcast, Send/Recv,
Scatter) but all of them didn't work for me. The output of the root
processor (0 in my case) is always okay, the output of the others are
simple 0.

The Array stat is filled with entrys from a file (I left out the
generation of the Array Data since this is much code and it works
(tested the whole thing in "single" mode.))

Here are the important parts of the Source Code:

const int ARRAYSIZE = 150;
int main(int argc, char* argv[])
{
MPI_Init(&argc,&argv);
int rank, anzprocs,recvcount,sendcnt;
MPI_Comm_size(MPI_COMM_WORLD,&anzprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);

const int WORKING = ARRAYSIZE/anzprocs;

double stat[ARRAYSIZE][2];
double stathlp[WORKING][2];

double stat2[WORKING][5];
double stat3[anzprocs][ARRAYSIZE][5];
if(rank==0)sendcnt=WORKING*2;

MPI::COMM_WORLD.Scatter(stat,sendcnt,MPI::DOUBLE,stathlp,WORKING*2,MPI::DOUBLE,0);

for(int i=0;ihttp://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Problems Broadcasting/Scattering Data

2008-01-08 Thread Dino Rossegger

Hi,
I have a problem distributing a 2 dimensional array over 3 processes.

I tried  different methods to distribute the data (Broadcast, Send/Recv,
Scatter) but all of them didn't work for me. The output of the root
processor (0 in my case) is always okay, the output of the others are
simple 0.

The Array stat is filled with entrys from a file (I left out the
generation of the Array Data since this is much code and it works
(tested the whole thing in "single" mode.))

Here are the important parts of the Source Code:

const int ARRAYSIZE = 150;
int main(int argc, char* argv[])
{
MPI_Init(&argc,&argv);
int rank, anzprocs,recvcount,sendcnt;
MPI_Comm_size(MPI_COMM_WORLD,&anzprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);

const int WORKING = ARRAYSIZE/anzprocs;

double stat[ARRAYSIZE][2];
double stathlp[WORKING][2];

double stat2[WORKING][5];
double stat3[anzprocs][ARRAYSIZE][5];
if(rank==0)sendcnt=WORKING*2;

MPI::COMM_WORLD.Scatter(stat,sendcnt,MPI::DOUBLE,stathlp,WORKING*2,MPI::DOUBLE,0);

for(int i=0;iI don't know any further since my experience with OMPI is also not too 
big. Is there anything specific I have to know about distributing 2 
Dimensional Arrays? I don't think that the error is in the MPI_Gather, 
since I did a cout of the data on all nodes and the output was the same.


Thanks in advance and sorry for my bad english
Dino



Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-04 Thread Dino Rossegger
I'll try to reinstall openmpi on a nfsdevice, maybe it works then.

Thanks for your help
dino

Tim Prins schrieb:
> Unfortunately, I am out of ideas on this one. It is very strange. Maybe 
> someone else has an idea.
> 
> I would recommend trying to install Open MPI again. First be sure to get 
> rid of all of the old installs of Open MPI from all your nodes, then 
> reinstall and try again.
> 
> Tim
> 
> Dino Rossegger wrote:
>> Here the Syntax & Output of the Command:
>> root@sun:~# mpirun --hostfile hostfile saturn
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:28777] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:28777] ERROR: There may be more information available from
>> [sun:28777] ERROR: the remote shell (see above).
>> [sun:28777] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
>>
>> I'm using version 1.2.3, got it from openmpi.org. I'm using the same
>> version of openmpi on all nodes.
>>
>> Thanks
>> dino
>>
>> Tim Prins schrieb:
>>> This is very odd. The daemon is being launched properly, but then things 
>>> get strange. It looks like mpirun is sending a message to kill 
>>> application processes on saturn.
>>>
>>> What version of Open MPI are you using?
>>>
>>> Are you sure that the same version of Open MPI us being used everywhere?
>>>
>>> Can you try:
>>> mpirun --hostfile hostfile hostname
>>>
>>> Thanks,
>>>
>>> Tim
>>>
>>> Dino Rossegger wrote:
>>>> Hi again,
>>>>
>>>> Tim Prins schrieb:
>>>>> Hi,
>>>>>
>>>>> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
>>>>>> Hi again,
>>>>>>
>>>>>> Yes the error output is the same:
>>>>>> root@sun:~# mpirun --hostfile hostfile main
>>>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>>>> base/pls_base_orted_cmds.c at line 275
>>>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>>>> line 1164
>>>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 
>>>>>> 90
>>>>>> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
>>>>>> [sun:23748] ERROR: There may be more information available from
>>>>>> [sun:23748] ERROR: the remote shell (see above).
>>>>>> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
>>>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>>>> base/pls_base_orted_cmds.c at line 188
>>>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>>>> line 1196
>>>>>> --
>>>>>> mpirun was unable to cleanly terminate the daemons for this job.
>>>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>>>
>>>>>> --
>>>>> Can you try:
>>>>> mpirun --debug-daemons --hostfile hostfile main
>>>>>
>>>> Did it but it doesn't give me any special output (as far as I can see that)
>>>> Heres the output:
>>>> root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
>>>> Daemon [0,0,1] checking in as pid 27168 on host sun
>>>> [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
>>>> ,0]
>>>> [sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
>>>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
>>>> ase_orted_cmds.c at line 275

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-02 Thread Dino Rossegger
Here the Syntax & Output of the Command:
root@sun:~# mpirun --hostfile hostfile saturn
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:28777] ERROR: A daemon on node saturn failed to start as expected.
[sun:28777] ERROR: There may be more information available from
[sun:28777] ERROR: the remote shell (see above).
[sun:28777] ERROR: The daemon exited unexpectedly with status 255.
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:28777] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

I'm using version 1.2.3, got it from openmpi.org. I'm using the same
version of openmpi on all nodes.

Thanks
dino

Tim Prins schrieb:
> This is very odd. The daemon is being launched properly, but then things 
> get strange. It looks like mpirun is sending a message to kill 
> application processes on saturn.
> 
> What version of Open MPI are you using?
> 
> Are you sure that the same version of Open MPI us being used everywhere?
> 
> Can you try:
> mpirun --hostfile hostfile hostname
> 
> Thanks,
> 
> Tim
> 
> Dino Rossegger wrote:
>> Hi again,
>>
>> Tim Prins schrieb:
>>> Hi,
>>>
>>> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
>>>> Hi again,
>>>>
>>>> Yes the error output is the same:
>>>> root@sun:~# mpirun --hostfile hostfile main
>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>> base/pls_base_orted_cmds.c at line 275
>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>> line 1164
>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>>>> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
>>>> [sun:23748] ERROR: There may be more information available from
>>>> [sun:23748] ERROR: the remote shell (see above).
>>>> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>> base/pls_base_orted_cmds.c at line 188
>>>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>> line 1196
>>>> --
>>>> mpirun was unable to cleanly terminate the daemons for this job.
>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>
>>>> --
>>> Can you try:
>>> mpirun --debug-daemons --hostfile hostfile main
>>>
>> Did it but it doesn't give me any special output (as far as I can see that)
>> Heres the output:
>> root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
>> Daemon [0,0,1] checking in as pid 27168 on host sun
>> [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
>> ,0]
>> [sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
>> ase_orted_cmds.c at line 275
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
>> dule.c at line 1164
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
>> .c at line 90
>> [sun:27167] ERROR: A daemon on node saturn failed to start as
>> expected.
>> [sun:27167] ERROR: There may be more information available fro
>> m
>> [sun:27167] ERROR: the remote shell (see above).
>> [sun:27167] ERROR: The daemon exited unexpectedly with status
>> 255.
>> [sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
>> ,0]
>> [sun:27168] [0,0,1] orted_recv_pls: received exit
>>
>>
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
>> ase_orted_cmds.c at line 188
>> [sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
>> dule.c at line 1196
>> --
>> 
>> mpirun was unable to cleanly terminate the daemons for this jo
>> b. Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-02 Thread Dino Rossegger
Hi again,

Tim Prins schrieb:
> Hi,
> 
> On Monday 01 October 2007 03:56:16 pm Dino Rossegger wrote:
>> Hi again,
>>
>> Yes the error output is the same:
>> root@sun:~# mpirun --hostfile hostfile main
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:23748] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:23748] ERROR: There may be more information available from
>> [sun:23748] ERROR: the remote shell (see above).
>> [sun:23748] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
> Can you try:
> mpirun --debug-daemons --hostfile hostfile main
> 
Did it but it doesn't give me any special output (as far as I can see that)
Heres the output:
root@sun:~# mpirun --debug-daemons --hostfile hostfile ./main
Daemon [0,0,1] checking in as pid 27168 on host sun
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received kill_local_procs
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 275
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1164
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp
.c at line 90
[sun:27167] ERROR: A daemon on node saturn failed to start as
expected.
[sun:27167] ERROR: There may be more information available fro
m
[sun:27167] ERROR: the remote shell (see above).
[sun:27167] ERROR: The daemon exited unexpectedly with status
255.
[sun:27168] [0,0,1] orted_recv_pls: received message from [0,0
,0]
[sun:27168] [0,0,1] orted_recv_pls: received exit


[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_b
ase_orted_cmds.c at line 188
[sun:27167] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_mo
dule.c at line 1196
--

mpirun was unable to cleanly terminate the daemons for this jo
b. Returned value Timeout instead of ORTE_SUCCESS.

--


> This may give more output about the error. Also, try
> mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfile hostfile main

Heres the output, but I cant decipher it ^^
root@sun:~# mpirun -mca pls rsh -mca pls_rsh_debug 1 --hostfil
e hostfile main
[sun:27175] pls:rsh: local csh: 0, local sh: 1
[sun:27175] pls:rsh: assuming same remote shell as local shell
[sun:27175] pls:rsh: remote csh: 0, remote sh: 1
[sun:27175] pls:rsh: final template argv:
[sun:27175] pls:rsh: /usr/bin/ssh  orted --bootp
roxy 1 --name  --num_procs 3 --vpid_start 0 --nodena
me  --universe root@sun:default-universe-27175 --nsr
eplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733
" --gprreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.
202:4733"
[sun:27175] pls:rsh: launching on node sun
[sun:27175] pls:rsh: sun is a LOCAL node
[sun:27175] pls:rsh: changing to directory /root
[sun:27175] pls:rsh: executing: (/usr/local/bin/orted) orted -
-bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --noden
ame sun --universe root@sun:default-universe-27175 --nsreplica
 "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:4733" --gp
rreplica "0.0.0;tcp://192.168.1.254:4733;tcp://172.16.0.202:47
33" --set-sid [SSH_AGENT_PID=24793 TERM=xterm SHELL=/bin/bash
SSH_CLIENT=10.2.56.124 21001 22 SSH_TTY=/dev/pts/0 USER=root L
D_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/b
in:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib SSH_AUTH_SOCK
=/tmp/ssh-sxbbH24792/agent.24792 MAIL=/var/mail/root PATH=/usr
/local/bin:/usr/bin:/bin:/usr/games:/opt/c3-4/:/usr/local/lib
PWD=/root LANG=en_US.UTF-8 SHLVL=1 HOME=/root LOGNAME=root SSH
_CONNECTION=10.2.56.124 21001 172.16.0.202 22 _=/usr/local/bin
/mpirun OMPI_MCA_rds_hostfile_path=hostfile orte-job-globals O
MPI_MCA_pls_rsh_debug=1 OMPI_MCA_seed=0]
[sun:27175] pls:rsh: launching on node saturn
[sun:27175] pls:rsh: saturn is a REMOTE node
[sun:27175] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh s

aturn orted --bootproxy 1 --name
0.0.2 --num_procs 3 --vpid_st

   art 0 --nodename saturn --universe root@sun:default-universe-2

  

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-01 Thread Dino Rossegger
Hi again,

Yes the error output is the same:
root@sun:~# mpirun --hostfile hostfile main
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:23748] ERROR: A daemon on node saturn failed to start as expected.
[sun:23748] ERROR: There may be more information available from
[sun:23748] ERROR: the remote shell (see above).
[sun:23748] ERROR: The daemon exited unexpectedly with status 255.
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

I wrote the following to my .ssh/environment (on all machines)
LD_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib;

PATH=$PATH:/usr/local/lib;

export LD_LIBRARY_PATH;
export PATH;

and added the statement you told me to the ssd_config (on all machines):
PermitUserEnvironment yes

And it seems to me that the pathes are correct now.

My shell is bash (/bin/bash)

When running locate orted (to find out where exactly my openmpi
installation is (compilation defaults) i saw that, on sun there was a
/usr/bin/orted while there wasn't one on saturn.
I deleted /usr/bin/orted on sun and tried again with the option --prefix
 /usr/local/ (which seems to be  my installation directory) but it
didn't work (same error).

Is there a script or anything like that with which I can uninstall
openmpi, because i'll might try a new compilation to /opt/openmpi since
it doesn't look like I would be able to solve the problem.




jody schrieb:
> Now that the PATHs seem to be set correctly for
> ssh i don't know what the problem could be.
> 
> Is the error message still the same on as in the first mail?
> Did you do the envorpnment/sshd_config on both machines?
> What shell are you using?
> 
> On other test you could make is to start your application
> with the --prefix option:
> 
> $mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main
> 
> (assuming your Open MPI installation lies in /opt/openmpi
> on both machines)
> 
> 
> Jody
> 
> On 10/1/07, Dino Rossegger  wrote:
>> Hi Jodi,
>> did the steps as you said, but it didn't work for me.
>> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
>> made the changes to sshd_config.
>>
>> But this all didn't solve my problem, although the pahts seemed to be
>> set correctly (judging what ssh saturn `printenv >> test` says). I also
>> restarted the ssh server, the error is the same.
>>
>> Hope you can help me out here and thanks for your help so far
>> dino
>>
>> jody schrieb:
>>> Dino -
>>> I had a similar problem.
>>> I was only able to solve it by setting PATH and LS_LIBRARY_PATH
>>> in the file ~/ssh/environment on the client and setting
>>>   PermitUserEnvironment yes
>>> in /etc/ssh/sshd_config on the server (for this you need root
>>> prioviledge though)
>>>
>>> To be on the safe side, i did both on all my nodes
>>>
>>> Jody
>>>
>>> On 9/27/07, Dino Rossegger  wrote:
>>>> Hi Jody,
>>>>
>>>> Thanks for your help, it really is the case that either in PATH nor in
>>>> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
>>>> hope it works.
>>>>
>>>> jody schrieb:
>>>>> Hi Dino
>>>>>
>>>>> Try
>>>>>  ssh saturn printenv | grep PATH
>>>>> >from your host sun to see what your environment variables are when
>>>>> ssh is run without a shell.
>>>>>
>>>>>
>>>>> On 9/27/07, Dino Rossegger  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem running a simple programm mpihello.cpp.
>>>>>>
>>>>>> Here is a excerp of the error and the command
>>>>>> root@sun:~# mpirun -H sun,saturn main
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>>>> base/pls_base_orted_cmds.c at line 275
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>>>> line 1164
>>>>>>

Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-10-01 Thread Dino Rossegger
Hi Jodi,
did the steps as you said, but it didn't work for me.
I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
made the changes to sshd_config.

But this all didn't solve my problem, although the pahts seemed to be
set correctly (judging what ssh saturn `printenv >> test` says). I also
restarted the ssh server, the error is the same.

Hope you can help me out here and thanks for your help so far
dino

jody schrieb:
> Dino -
> I had a similar problem.
> I was only able to solve it by setting PATH and LS_LIBRARY_PATH
> in the file ~/ssh/environment on the client and setting
>   PermitUserEnvironment yes
> in /etc/ssh/sshd_config on the server (for this you need root
> prioviledge though)
> 
> To be on the safe side, i did both on all my nodes
> 
> Jody
> 
> On 9/27/07, Dino Rossegger  wrote:
>> Hi Jody,
>>
>> Thanks for your help, it really is the case that either in PATH nor in
>> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
>> hope it works.
>>
>> jody schrieb:
>>> Hi Dino
>>>
>>> Try
>>>  ssh saturn printenv | grep PATH
>>> >from your host sun to see what your environment variables are when
>>> ssh is run without a shell.
>>>
>>>
>>> On 9/27/07, Dino Rossegger  wrote:
>>>> Hi,
>>>>
>>>> I have a problem running a simple programm mpihello.cpp.
>>>>
>>>> Here is a excerp of the error and the command
>>>> root@sun:~# mpirun -H sun,saturn main
>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>> base/pls_base_orted_cmds.c at line 275
>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>> line 1164
>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>>>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>>>> [sun:25213] ERROR: There may be more information available from
>>>> [sun:25213] ERROR: the remote shell (see above).
>>>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>> base/pls_base_orted_cmds.c at line 188
>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>> line 1196
>>>> --
>>>> mpirun was unable to cleanly terminate the daemons for this job.
>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>
>>>> --
>>>>
>>>> The program is runable from each node alone (mpirun -np2 main)
>>>>
>>>> My PathVariables:
>>>> $PATH
>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
>>>> $LD_LIBRARY_PATH
>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
>>>>
>>>> Passwordless ssh is up 'n running
>>>>
>>>> I walked through the FAQ and Mailing Lists but couldn't find any
>>>> solution for my problem.
>>>>
>>>> Thanks
>>>> Dino R.
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread Dino Rossegger
Hi Jody,

Thanks for your help, it really is the case that either in PATH nor in
LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
hope it works.

jody schrieb:
> Hi Dino
> 
> Try
>  ssh saturn printenv | grep PATH
>>from your host sun to see what your environment variables are when
> ssh is run without a shell.
> 
> 
> On 9/27/07, Dino Rossegger  wrote:
>> Hi,
>>
>> I have a problem running a simple programm mpihello.cpp.
>>
>> Here is a excerp of the error and the command
>> root@sun:~# mpirun -H sun,saturn main
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1164
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>> [sun:25213] ERROR: There may be more information available from
>> [sun:25213] ERROR: the remote shell (see above).
>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>> line 1196
>> --
>> mpirun was unable to cleanly terminate the daemons for this job.
>> Returned value Timeout instead of ORTE_SUCCESS.
>>
>> --
>>
>> The program is runable from each node alone (mpirun -np2 main)
>>
>> My PathVariables:
>> $PATH
>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
>> $LD_LIBRARY_PATH
>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
>>
>> Passwordless ssh is up 'n running
>>
>> I walked through the FAQ and Mailing Lists but couldn't find any
>> solution for my problem.
>>
>> Thanks
>> Dino R.
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



[OMPI users] mpirun ERROR: The daemon exited unexpectedly with status 255.

2007-09-27 Thread Dino Rossegger
Hi,

I have a problem running a simple programm mpihello.cpp.

Here is a excerp of the error and the command
root@sun:~# mpirun -H sun,saturn main
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:25213] ERROR: A daemon on node saturn failed to start as expected.
[sun:25213] ERROR: There may be more information available from
[sun:25213] ERROR: the remote shell (see above).
[sun:25213] ERROR: The daemon exited unexpectedly with status 255.
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

The program is runable from each node alone (mpirun -np2 main)

My PathVariables:
$PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
$LD_LIBRARY_PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib

Passwordless ssh is up 'n running

I walked through the FAQ and Mailing Lists but couldn't find any
solution for my problem.

Thanks
Dino R.