Re: [OMPI users] Program hangs when run in the remote host ...

2009-10-06 Thread Ashley Pittman
On Tue, 2009-10-06 at 12:22 +0530, souvik bhattacherjee wrote:

> This implies that one has to copy the executables in the remote host
> each time one requires to run a program which is different from the
> previous one. 

This is correct, the name of the executable is passed to each node and
that executable is then executed locally.

> Is the implication correct or is there some way around.

Typically some kind of a shared filesystem would be used, nfs for
example.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI users] Program hangs when run in the remote host ...

2009-10-06 Thread souvik bhattacherjee
Finally, it seems I'm able to run my program on a remote host.

The problem was due to some firewall settings. Modifying the firewall ACCEPT
policy as shown below, did the work.

# /etc/init.d/ip6tables stop
Resetting built-in chains to the default ACCEPT policy: [  OK  ]
# /etc/init.d/iptables stop
Resetting built-in chains to the default ACCEPT policy: [  OK  ]

Another related query:

Let me mention once again, I had installed openmpi-1.3.3 separately on two
of my machines ict1 and ict2. Now when I issue the following command :

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
--
mpirun was unable to launch the specified application as it could not find
an executable:

Executable: hello_c
Node: ict1

while attempting to start process rank 1.
--

So, I did a *make* on the examples directory on ict1 to generate the
executable (One can also copy the executable from ict2 to ict1 in the same
directory).

Now, it seems to run fine.

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
Hello, world, I am 0 of 8
Hello, world, I am 2 of 8
Hello, world, I am 4 of 8
Hello, world, I am 6 of 8
Hello, world, I am 5 of 8
Hello, world, I am 3 of 8
Hello, world, I am 7 of 8
Hello, world, I am 1 of 8
$

This implies that one has to copy the executables in the remote host each
time one requires to run a program which is different from the previous one.

Is the implication correct or is there some way around.

Thanks,


On Mon, Sep 21, 2009 at 1:54 PM, souvik bhattacherjee wrote:

> As Ralph suggested, I *reversed the order of my PATH settings*:
>
> This is what I it shows:
>
> $ echo $PATH
>
> /usr/local/openmpi-1.3.3/bin/:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin/:/usr/games:/usr/lib/qt4/bin:/usr/bin:/opt/kde3/bin
>
> $ echo $LD_LIBRARY_PATH
> /usr/local/openmpi-1.3.3/lib/
>
> Moreover, I checked that there were *NO* system supplied versions of OMPI,
> previously installed. ( I did install MPICH2 earlier, but I had removed the
> binaries and the related files). This is because,
>
> $ locate mpicc
>
> /home/souvik/software/openmpi-1.3.3/build/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
>
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc.1
>
> /home/souvik/software/openmpi-1.3.3/contrib/platform/win32/ConfigFiles/mpicc-wrapper-data.txt.cmake
>
> /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/
> mpicc-vt-wrapper-data.txt.in
>
> /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/mpicc-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/
> mpicc-wrapper-data.txt.in
> /usr/local/openmpi-1.3.3/bin/mpicc
> /usr/local/openmpi-1.3.3/bin/mpicc-vt
> /usr/local/openmpi-1.3.3/share/man/man1/mpicc.1
> /usr/local/openmpi-1.3.3/share/openmpi/mpicc-vt-wrapper-data.txt
> /usr/local/openmpi-1.3.3/share/openmpi/mpicc-wrapper-data.txt
>
> does not show the occurrence of mpicc in any directory related to MPICH2.
>
> The results are same with mpirun
>
> $ locate mpirun
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/ortetools/mpirun.1
> /home/souvik/software/openmpi-1.3.3/ompi/runtime/mpiruntime.h
> /usr/local/openmpi-1.3.3/bin/mpirun
> /usr/local/openmpi-1.3.3/share/man/man1/mpirun.1
>
> *These tests were done both on ict1 and ict2*.
>
> I performed another test which probably proves that the executable finds
> the required files on the remote host. The program was run from ict2.
>
> $ cd /home/souvik/software/openmpi-1.3.3/examples/
>
> $ mpirun -np 4 --host ict2,ict1 hello_c
> bash: orted: command not found
> --
> A daemon (pid 28023) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> mpirun: clean termination accomplished
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
>
> *This command-line statement as usual does not produce any output. On
> pressing Crtl+C, the 

Re: [OMPI users] Program hangs when run in the remote host ...

2009-09-21 Thread souvik bhattacherjee
As Ralph suggested, I *reversed the order of my PATH settings*:

This is what I it shows:

$ echo $PATH
/usr/local/openmpi-1.3.3/bin/:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin/:/usr/games:/usr/lib/qt4/bin:/usr/bin:/opt/kde3/bin

$ echo $LD_LIBRARY_PATH
/usr/local/openmpi-1.3.3/lib/

Moreover, I checked that there were *NO* system supplied versions of OMPI,
previously installed. ( I did install MPICH2 earlier, but I had removed the
binaries and the related files). This is because,

$ locate mpicc
/home/souvik/software/openmpi-1.3.3/build/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
/home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc-wrapper-data.txt
/home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc.1
/home/souvik/software/openmpi-1.3.3/contrib/platform/win32/ConfigFiles/mpicc-wrapper-data.txt.cmake
/home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
/home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/
mpicc-vt-wrapper-data.txt.in
/home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/mpicc-wrapper-data.txt
/home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/
mpicc-wrapper-data.txt.in
/usr/local/openmpi-1.3.3/bin/mpicc
/usr/local/openmpi-1.3.3/bin/mpicc-vt
/usr/local/openmpi-1.3.3/share/man/man1/mpicc.1
/usr/local/openmpi-1.3.3/share/openmpi/mpicc-vt-wrapper-data.txt
/usr/local/openmpi-1.3.3/share/openmpi/mpicc-wrapper-data.txt

does not show the occurrence of mpicc in any directory related to MPICH2.

The results are same with mpirun

$ locate mpirun
/home/souvik/software/openmpi-1.3.3/build/ompi/tools/ortetools/mpirun.1
/home/souvik/software/openmpi-1.3.3/ompi/runtime/mpiruntime.h
/usr/local/openmpi-1.3.3/bin/mpirun
/usr/local/openmpi-1.3.3/share/man/man1/mpirun.1

*These tests were done both on ict1 and ict2*.

I performed another test which probably proves that the executable finds the
required files on the remote host. The program was run from ict2.

$ cd /home/souvik/software/openmpi-1.3.3/examples/

$ mpirun -np 4 --host ict2,ict1 hello_c
bash: orted: command not found
--
A daemon (pid 28023) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c

*This command-line statement as usual does not produce any output. On
pressing Crtl+C, the following output occurs*

^Cmpirun: killing job...

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--
ict1 - daemon did not report back when launched

$

Also, doing *top *does not show any *mpirun* & *hello_c* process running in
both the hosts. However, running hello_c in a single host say, ict2 does
show *mpirun* & *hello_c* in the process list.





On Sat, Sep 19, 2009 at 8:13 PM, Ralph Castain  wrote:

> One thing that flags my attention. In your PATH definition, you put $PATH
> ahead of your OMPI 1.3.3 installation. Thus, if there are any system
> supplied versions of OMPI hanging around (and there often are), they will be
> executed instead of your new installation.
> You might try reversing that order.
>
> On Sep 19, 2009, at 7:33 AM, souvik bhattacherjee wrote:
>
> Hi Gus (and all OpenMPI users),
>
> Thanks for your interest in my problem. However, the points you had raised
> earlier in your mails, seems to me that, I had already taken care of them. I
> had enlisted them below pointwise. Your comments are rewritten in *RED *and
> my replies in *BLACK.*
>
> 1) As you have mentioned: "*I would guess you only installed OpenMPI only
> on ict1, not on ict2*". However, I had mentioned initially: "*I had
> installed openmpi-1.3.3 separately on two of my machines ict1 and ict2*".
>

Re: [OMPI users] Program hangs when run in the remote host ...

2009-09-19 Thread Ralph Castain
One thing that flags my attention. In your PATH definition, you put  
$PATH ahead of your OMPI 1.3.3 installation. Thus, if there are any  
system supplied versions of OMPI hanging around (and there often are),  
they will be executed instead of your new installation.


You might try reversing that order.

On Sep 19, 2009, at 7:33 AM, souvik bhattacherjee wrote:


Hi Gus (and all OpenMPI users),

Thanks for your interest in my problem. However, the points you had  
raised earlier in your mails, seems to me that, I had already taken  
care of them. I had enlisted them below pointwise. Your comments are  
rewritten in RED and my replies in BLACK.


1) As you have mentioned: "I would guess you only installed OpenMPI  
only on ict1, not on ict2". However, I had mentioned initially: "I  
had installed openmpi-1.3.3 separately on two of my machines ict1  
and ict2".


2) Next you said: "I am guessing this, because you used a prefix  
under /usr/local". However, I had installed them under:

$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/
# make all install

3) Next as you pointed out: " ...not a typical name of an NFS  
mounted directory. Using an NFS mounted directory is another way to  
make OpenMPI visible to all nodes ".
Let me tell you once again, that I am not going for an NFS  
installation as the first point in this list makes it clear.


4) In your next mail: " If you can ssh passwordless from ict1 to  
ict2 *and* vice versa ". Again as I had mentioned earlier " As a  
prerequisite, I can ssh between them without a password or  
passphrase ( I did not supply the passphrase at all ). "


5) Further as you said: " If your /etc/hosts file on *both* machines  
list ict1 and ict2
and their IP addresses ". Let me mention here that, these things are  
already very well taken care of.


6) Finally as you said: " In case you have a /home directory on each  
machine (i.e. /home is not NFS mounted) if your .bashrc files on  
*both* machines set the PATH

and LD_LIBRARY_PATH to point to the OpenMPI directory. "

Again as I had mentioned previously,  Also .bash_profile and .bashrc  
had the following lines written into them:


PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
***

As an additional bit of information, (which might assist you in the  
investigation) I had used Mandriva 2009.1 on all of my systems.


Hope, this will help you. Eagerly awaiting a response.

Thanks,

On 9/18/09, Gus Correa  wrote:
Hi Souvik

Also worth checking:

1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
2) If your /etc/hosts file on *both* machines list ict1 and ict2
and their IP addresses.
3) In case you have a /home directory on each machine (i.e. /home is
not NFS mounted) if your .bashrc files on *both* machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory.

Gus Correa


Gus Correa wrote:
Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required  OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:
http://www.open-mpi.org/faq/?category=building#where-to-install

I hope this helps,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


souvik bhattacherjee wrote:
Dear all,

Myself quite new to Open MPI. Recently, I had installed  
openmpi-1.3.3 separately on two of my machines ict1 and ict2. These  
machines are dual-socket quad-core (Intel Xeon E5410) i.e. each  
having 8 processors and are connected by Gigabit ethernet switch. As  
a prerequisite, I can ssh between them without a password or  
passphrase ( I did not supply the passphrase at all ). Thereafter,


$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into  
them:


PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

--


$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
  hello_c: error while loading shared libraries: libmpi.so.0: cannot  
open shared object file: No suchfile or directory
  hello_c: error while loading shared libraries: libmpi.so.0: 

Re: [OMPI users] Program hangs when run in the remote host ...

2009-09-19 Thread souvik bhattacherjee
Hi Gus (and all OpenMPI users),

Thanks for your interest in my problem. However, the points you had raised
earlier in your mails, seems to me that, I had already taken care of them. I
had enlisted them below pointwise. Your comments are rewritten in *RED *and
my replies in *BLACK.*

1) As you have mentioned: "*I would guess you only installed OpenMPI only on
ict1, not on ict2*". However, I had mentioned initially: "*I had installed
openmpi-1.3.3 separately on two of my machines ict1 and ict2*".

2) Next you said: "*I am guessing this, because you used a prefix under
/usr/local*". However, I had installed them under:
*$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/
# make all install*

3) Next as you pointed out: "* ...not a typical name of an NFS mounted
directory. Using an NFS mounted directory is another way to make OpenMPI
visible to all nodes *".
Let me tell you once again, that I am not going for an NFS installation as
the first point in this list makes it clear.

4) In your next mail: " *If you can ssh passwordless from ict1 to ict2 *and*
vice versa *". Again as I had mentioned earlier " *As a prerequisite, I can
ssh between them without a password or passphrase ( I did not supply the
passphrase at all ).* "

5) Further as you said: " *If your /etc/hosts file on *both* machines list
ict1 and ict2
and their IP addresses *". Let me mention here that, these things are
already very well taken care of.

6) Finally as you said: " *In case you have a /home directory on each
machine (i.e. /home is not NFS mounted) if your .bashrc files on *both*
machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory. *"

Again as I had mentioned previously,  *Also .bash_profile and .bashrc had
the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/*
*
***
*
**
As an additional bit of information, (which might assist you in the
investigation) I had used *Mandriva 2009.1* on all of my systems.

Hope, this will help you. Eagerly awaiting a response.

Thanks,

On 9/18/09, Gus Correa  wrote:
>
> Hi Souvik
>
> Also worth checking:
>
> 1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
> 2) If your /etc/hosts file on *both* machines list ict1 and ict2
> and their IP addresses.
> 3) In case you have a /home directory on each machine (i.e. /home is
> not NFS mounted) if your .bashrc files on *both* machines set the PATH
> and LD_LIBRARY_PATH to point to the OpenMPI directory.
>
> Gus Correa
>
> Gus Correa wrote:
>
>> Hi Souvik
>>
>> I would guess you only installed OpenMPI only on ict1, not on ict2.
>> If that is the case you won't have the required  OpenMPI libraries
>> on ict:/usr/local, and the job won't run on ict2.
>>
>> I am guessing this, because you used a prefix under /usr/local,
>> which tends to be a "per machine" directory,
>> not a typical name of an NFS
>> mounted directory.
>> Using an NFS mounted directory is another way to make
>> OpenMPI visible to all nodes.
>> See this FAQ:
>> http://www.open-mpi.org/faq/?category=building#where-to-install
>>
>> I hope this helps,
>> Gus Correa
>> -
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> -
>>
>>
>> souvik bhattacherjee wrote:
>>
>>> Dear all,
>>>
>>> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
>>> separately on two of my machines ict1 and ict2. These machines are
>>> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and
>>> are connected by Gigabit ethernet switch. As a prerequisite, I can ssh
>>> between them without a password or passphrase ( I did not supply the
>>> passphrase at all ). Thereafter,
>>>
>>> $ cd openmpi-1.3.3
>>> $ mkdir build
>>> $ cd build
>>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>>>
>>> Then as a root user,
>>>
>>> # make all install
>>>
>>> Also .bash_profile and .bashrc had the following lines written into them:
>>>
>>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
>>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
>>>
>>> --
>>>
>>>
>>>
>>> $ cd ../examples/
>>> $ make
>>> $ mpirun -np 2 --host ict1 hello_c
>>>   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
>>> shared object file: No suchfile or directory
>>>   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
>>> shared object file: No suchfile or directory
>>>
>>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
>>>   Hello, world, I am 1 of 2
>>>   

Re: [OMPI users] Program hangs when run in the remote host ...

2009-09-18 Thread Gus Correa

Hi Souvik

Also worth checking:

1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
2) If your /etc/hosts file on *both* machines list ict1 and ict2
and their IP addresses.
3) In case you have a /home directory on each machine (i.e. /home is
not NFS mounted) if your .bashrc files on *both* machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory.

Gus Correa

Gus Correa wrote:

Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required  OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:
http://www.open-mpi.org/faq/?category=building#where-to-install

I hope this helps,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


souvik bhattacherjee wrote:

Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 
separately on two of my machines ict1 and ict2. These machines are 
dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors 
and are connected by Gigabit ethernet switch. As a prerequisite, I can 
ssh between them without a password or passphrase ( I did not supply 
the passphrase at all ). Thereafter,


$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

-- 




$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
   hello_c: error while loading shared libraries: libmpi.so.0: cannot 
open shared object file: No suchfile or directory
   hello_c: error while loading shared libraries: libmpi.so.0: cannot 
open shared object file: No suchfile or directory


$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when 

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 
hello_c
 
This statement does not produce any output. Doing top on either 
machines does not show any hello_c running. However, when I press 
Ctrl+C the following output appears


^Cmpirun: killing job...

-- 


mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
-- 

-- 


mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
-- 


ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, 
the program does not produce any error, it becomes difficult to locate 
where I might have gone wrong.


Did anyone of you encounter this problem or anything similar ? Any 
help would be much appreciated.


Thanks,

--

Souvik




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Program hangs when run in the remote host ...

2009-09-18 Thread Gus Correa

Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required  OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:
http://www.open-mpi.org/faq/?category=building#where-to-install

I hope this helps,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


souvik bhattacherjee wrote:

Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 
separately on two of my machines ict1 and ict2. These machines are 
dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors 
and are connected by Gigabit ethernet switch. As a prerequisite, I can 
ssh between them without a password or passphrase ( I did not supply the 
passphrase at all ). Thereafter,


$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

--


$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
   hello_c: error while loading shared libraries: libmpi.so.0: cannot 
open shared object file: No suchfile or directory
   hello_c: error while loading shared libraries: libmpi.so.0: cannot 
open shared object file: No suchfile or directory


$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when 

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c
 
This statement does not produce any output. Doing top on either machines 
does not show any hello_c running. However, when I press Ctrl+C the 
following output appears


^Cmpirun: killing job...

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--
ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, the 
program does not produce any error, it becomes difficult to locate where 
I might have gone wrong.


Did anyone of you encounter this problem or anything similar ? Any help 
would be much appreciated.


Thanks,

--

Souvik




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Program hangs when run in the remote host ...

2009-09-18 Thread souvik bhattacherjee
Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
separately on two of my machines ict1 and ict2. These machines are
dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and
are connected by Gigabit ethernet switch. As a prerequisite, I can ssh
between them without a password or passphrase ( I did not supply the
passphrase at all ). Thereafter,

$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

--


$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when 

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c

This statement does not produce any output. Doing top on either machines
does not show any hello_c running. However, when I press Ctrl+C the
following output appears

^Cmpirun: killing job...

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--
ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, the
program does not produce any error, it becomes difficult to locate where I
might have gone wrong.

Did anyone of you encounter this problem or anything similar ? Any help
would be much appreciated.

Thanks,

-- 

Souvik