Re: [OMPI users] Basic problems with OpenMPI

2007-08-30 Thread Amit Kumar Saha
Hello,

On 8/29/07, Jeff Squyres  wrote:
> Amit --
>
> I think you want to have a look at the "setup" FAQ -- many of the
> questions you have asked are answered there:
>
>  http://www.open-mpi.org/faq/?category=running


Thanks Jeff for pointing that out. I am sorry for not having looked up
the FAQ before asking the mailing list.

Regards
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Jeff Squyres

Amit --

I think you want to have a look at the "setup" FAQ -- many of the  
questions you have asked are answered there:


http://www.open-mpi.org/faq/?category=running


On Aug 29, 2007, at 6:07 AM, Amit Kumar Saha wrote:


Hi Gleb,

The above output shows that you have a problem on host ubuntu- 
desktop-2.

Have you setup login without a password from ubuntu-desktop-1 to
ubuntu-desktop-2?


Thank you very much. It works!

Regards
--
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hi Gleb,

> The above output shows that you have a problem on host ubuntu-desktop-2.
> Have you setup login without a password from ubuntu-desktop-1 to
> ubuntu-desktop-2?

Thank you very much. It works!

Regards
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 03:22:54PM +0530, Amit Kumar Saha wrote:
> Hi Glib,
> 
> i am sending a sample trace of my program:
> 
> amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 3 --hostfile
> mpi-host-file HellMPI
> 
> amit@debian-desktop-1's password: [ubuntu-desktop-1:28575] [0,0,0]
> ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
> [ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1164
> [ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> errmgr_hnp.c at line 90
> [ubuntu-desktop-1:28575] ERROR: A daemon on node ubuntu-desktop-2
> failed to start as expected.
> [ubuntu-desktop-1:28575] ERROR: There may be more information available from
> [ubuntu-desktop-1:28575] ERROR: the remote shell (see above).
> [ubuntu-desktop-1:28575] ERROR: The daemon exited unexpectedly with status 
> 255.
> [ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 188
> [ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1196
> --
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
> 
> --
> 
> this is what I get when i run the program.
> 
> However when i use "--np 2 " it works perfectly which of course means
> that it is not a problem with "debian-desktop-1" as the above output
> may show.
> 
The above output shows that you have a problem on host ubuntu-desktop-2.
Have you setup login without a password from ubuntu-desktop-1 to
ubuntu-desktop-2?


--
Gleb.


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hi Glib,

i am sending a sample trace of my program:

amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 3 --hostfile
mpi-host-file HellMPI

amit@debian-desktop-1's password: [ubuntu-desktop-1:28575] [0,0,0]
ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1164
[ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
errmgr_hnp.c at line 90
[ubuntu-desktop-1:28575] ERROR: A daemon on node ubuntu-desktop-2
failed to start as expected.
[ubuntu-desktop-1:28575] ERROR: There may be more information available from
[ubuntu-desktop-1:28575] ERROR: the remote shell (see above).
[ubuntu-desktop-1:28575] ERROR: The daemon exited unexpectedly with status 255.
[ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[ubuntu-desktop-1:28575] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1196
--
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--

this is what I get when i run the program.

However when i use "--np 2 " it works perfectly which of course means
that it is not a problem with "debian-desktop-1" as the above output
may show.

Please refer to the host file as well. (attached). I am using the same
openMPI version 1.2.3 and compiled all the executables using that.

Waiting for your suggestions.

Thanks
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


mpi-host-file
Description: Binary data


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 02:49:35PM +0530, Amit Kumar Saha wrote:
> Hi gleb,
> 
> 
> > Have you installed Open MPI at the same place on all nodes? What command
> > line are you using to run app on more then one host?
> 
> this is a sample run
> 
> amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 2 --hostfile
> mpi-host-file HellMPI
> amit@ubuntu-desktop-2's password:
> HellMPI: error while loading shared libraries: liborte.so.0: cannot
> open shared object file: No such file or directory
> 
HellMPI compiled with Open MPI 1.1 mpicc. Version 1.2 has libopen-rte.so and not
liborte.so.

--
Gleb.


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hi gleb,


> Have you installed Open MPI at the same place on all nodes? What command
> line are you using to run app on more then one host?

this is a sample run

amit@ubuntu-desktop-1:~/mpi-exec$ mpirun --np 2 --hostfile
mpi-host-file HellMPI
amit@ubuntu-desktop-2's password:
HellMPI: error while loading shared libraries: liborte.so.0: cannot
open shared object file: No such file or directory

I have them installed at the same place, (I have used the 'configure'
switch that you told me earlier)

Hope that helps.

Thanks
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 02:32:58PM +0530, Amit Kumar Saha wrote:
> Hi all,
> 
> I have installed OpenMPI 1.2.3 on all my hosts (3).
> 
> Now when I try to start a simple demo program ("hello world") using
> ./a.out I get the error. When I run my program using "mpirun" on more
> than one host it gives me similar error:
> 
> error while loading shared libraries: libopen-rte.so.0: cannot open
> shared object file: No such file or directory
> 
> However when I do a mpirun a.out , it gives me no error.
> 
> Please suggest
> 
Have you installed Open MPI at the same place on all nodes? What command
line are you using to run app on more then one host?

--
Gleb.


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hi all,

I have installed OpenMPI 1.2.3 on all my hosts (3).

Now when I try to start a simple demo program ("hello world") using
./a.out I get the error. When I run my program using "mpirun" on more
than one host it gives me similar error:

error while loading shared libraries: libopen-rte.so.0: cannot open
shared object file: No such file or directory

However when I do a mpirun a.out , it gives me no error.

Please suggest

Thanks
Amit

-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 01:03:30PM +0530, Amit Kumar Saha wrote:
> Also, is open MPI 1.1 compatible with MPI 1.2.3, I mean to ask is
> whether a MPI executable generated using 1.1 is executable by 1.2.3?
No. They are not compatible.

--
Gleb.


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hi Gleb,

On 8/29/07, Gleb Natapov  wrote:
> Where have you installed it? If in /usr/local/ then try to run
> mpirun --prefix /usr/local/ --np 1 --hostfile hostfile ./a.out

Thanks again. It solves the problem.

>
> If this helps then you may want to re-run configure script with flag
> --enable-orterun-prefix-by-default and recompile.

Also, is open MPI 1.1 compatible with MPI 1.2.3, I mean to ask is
whether a MPI executable generated using 1.1 is executable by 1.2.3?

i am trying to run a 1.1 generated executable on a remote 1.2.3 host
when i get the following:

amit@ubuntu-desktop-1:~/mpi-exec$ mpirun -np 3 --hostfile
/home/amit/junk/mpi-codes/mpi-host-file --mca btl ^openib ./HellMPI

amit@debian-desktop-1's password: amit@ubuntu-desktop-2's password:
[ubuntu-desktop-1:13202] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in
file dss/dss_peek.c at line 59
[ubuntu-desktop-1:13202] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in
file dss/dss_peek.c at line 59
[ubuntu-desktop-1:13202] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in
file dss/dss_peek.c at line 59

Note that the host "debian-desktop-1" is 1.2.3 and the other 2 is 1.1


Regards
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 12:26:54PM +0530, Amit Kumar Saha wrote:
> Hello all,
> 
> I have installed Open MPI 1.2.3 from source on Debian 4.0. I did the
> "make all install" using root privileges.
> 
> Now when I try to execute a simple program , I get the following:
> 
> debian-desktop-1:/home/amit/junk/mpi-codes# mpirun --np 1 --hostfile
> hostfile ./a.out
> ./a.out: error while loading shared libraries: libmpi.so.0: cannot
> open shared object file: No such file or directory
> 
> I get the error whether I do it as "normal user" or "root user"
> 
> Please suggest.
> 
Where have you installed it? If in /usr/local/ then try to run
mpirun --prefix /usr/local/ --np 1 --hostfile hostfile ./a.out

If this helps then you may want to re-run configure script with flag
--enable-orterun-prefix-by-default and recompile.

--
Gleb.


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
Hello all,

I have installed Open MPI 1.2.3 from source on Debian 4.0. I did the
"make all install" using root privileges.

Now when I try to execute a simple program , I get the following:

debian-desktop-1:/home/amit/junk/mpi-codes# mpirun --np 1 --hostfile
hostfile ./a.out
./a.out: error while loading shared libraries: libmpi.so.0: cannot
open shared object file: No such file or directory

I get the error whether I do it as "normal user" or "root user"

Please suggest.

Thanks
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Amit Kumar Saha
On 8/29/07, Gleb Natapov  wrote:
> On Wed, Aug 29, 2007 at 11:42:29AM +0530, Amit Kumar Saha wrote:
> > hello all,
> >
> > I am just trying to get started with OpenMPI (version 1.1) on Linux.
> Vesrion 1.1 is old an no longer supported.
>
> >
> > When I try to run a simple MPI - "Hello World" program, here is what i get:
> >
> > amit@ubuntu-desktop-1:~/junk/mpi-codes$ mpirun -np 1 --hostfile
> > mpi-host-file ./a.out
> > libibverbs: Fatal: couldn't read uverbs ABI version.
> > --
> > [0,1,0]: OpenIB on host ubuntu-desktop-1 was unable to find any HCAs.
> > Another transport will be used instead, although this may result in
> > lower performance.
> > --
> > Processor 0 of 1: Hello World!
> >
> > Please explain the statements above.
> Open MPI has Infiniband module compiled but there is no IB device found
> on your host. Try to add "--mca btl ^openib" string to your command
> line.
>
> >
> > Also, when I am trying to launch the above process on 2 processors,
> > instead of one, it gives me:
> >
> > Failed to find or execute the following executable:
> >
> > Host:   ubuntu-desktop-2
> > Executable: ./a.out
> >
> > Cannot continue.
> >
> > Does that mean I have to place a copy of the executable on the other
> > node as well? Where should I place the executable?
> >
> Yes. At the same location on each host.

Thank you very much Gleb. It works!

Regards
-- 
Amit Kumar Saha
[URL]:http://amitsaha.in.googlepages.com


Re: [OMPI users] Basic problems with OpenMPI

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 11:42:29AM +0530, Amit Kumar Saha wrote:
> hello all,
> 
> I am just trying to get started with OpenMPI (version 1.1) on Linux.
Vesrion 1.1 is old an no longer supported.

> 
> When I try to run a simple MPI - "Hello World" program, here is what i get:
> 
> amit@ubuntu-desktop-1:~/junk/mpi-codes$ mpirun -np 1 --hostfile
> mpi-host-file ./a.out
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --
> [0,1,0]: OpenIB on host ubuntu-desktop-1 was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> --
> Processor 0 of 1: Hello World!
> 
> Please explain the statements above.
Open MPI has Infiniband module compiled but there is no IB device found
on your host. Try to add "--mca btl ^openib" string to your command
line.

> 
> Also, when I am trying to launch the above process on 2 processors,
> instead of one, it gives me:
> 
> Failed to find or execute the following executable:
> 
> Host:   ubuntu-desktop-2
> Executable: ./a.out
> 
> Cannot continue.
> 
> Does that mean I have to place a copy of the executable on the other
> node as well? Where should I place the executable?
> 
Yes. At the same location on each host.

--
Gleb.