Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-02-02 Thread Lee Manko
Thanks, I'll give it a try!
Lee Manko



On Tue, Feb 2, 2010 at 10:01 AM, Ralph Castain  wrote:

> Probably the easiest solution is to tell OMPI not to use the second NIC.
> For example, if that NIC is eth1, then you could do this:
>
> mpirun -mca oob_tcp_if_exclude eth1 -mca btl_tcp_if_exclude eth1 ...
>
> This tells both the MPI layer and the RTE to ignore the eth1 interface.
>
>
>
>
> On Tue, Feb 2, 2010 at 10:04 AM, Lee Manko  wrote:
>
>> Thank you Jody and Ralph.  Your suggestions got me up and running (well
>> sort of).  I have run into another issue that I was wondering if you have
>> had any experience with.  My server has one NIC that is static and a second
>> that is DHCP on a corp network (the only way to get to the outside world).
>>  My scatter/gather process does not work when the second NIC is plugged in,
>> but does work when unplugged.  It appears to have something to do with DHCP
>> Discovery.
>>
>> Any suggestions?
>>
>> Lee Manko
>>
>>
>>
>> On Thu, Jan 28, 2010 at 11:53 AM, Lee Manko  wrote:
>>
>>> See, it was a simple thing.  Thank you for the information.  I am trying
>>> it now.  Have to recompile and re-install openmpi for a heterogeneous
>>> network.
>>>
>>> Now, knowing what to search for, I found that I can set the configuration
>>> of the cluster in a file that mpirun and mpiexec can read.
>>>
>>> mpirun --app my_appfile
>>>
>>>
>>> where app file contains the same --host information.  Makes customizing
>>> the cluster for certain applications very easy.
>>>
>>> Thanks for the guidance to this MPI newbie.
>>>
>>> Lee
>>>
>>>
>>>
>>>
>>> On Wed, Jan 27, 2010 at 11:43 PM, jody  wrote:
>>>
 Hi
 I'm not sure i completely understood.
 Is it the case that an application compiled on the dell will not work
 on the PS3 and vice versa?

 If this is the case, you could try this:
  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
 where app_ps3 is your application compiled on the PS3 and a is your PS3
 host,
 and app_dell is your application compiled on the dell, and b is your
 dell host.

 Check the MPI FAQs
  http://www.open-mpi.org/faq/?category=running#mpmd-run
  http://www.open-mpi.org/faq/?category=running#mpirun-host

 Hope this helps
   Jody

 On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
 > OK, so please stop me if you have heard this before, but I couldn’t
 find
 > anything in the archives that addressed my situation.
 >
 >
 >
 > I have a Beowulf cluster where ALL the node are PS3s running Yellow
 Dog
 > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
 Fedora
 > Core 12.  After a failed attempt at letting yum install openmpi, I
 > downloaded v1.4.1, compiled and installed on all machines (PS3s and
 > Dell).  I have an NSF shared directory on the host where the
 application
 > resides after building.  All nodes have access to the shared volume
 and they
 > can see any files in the shared volume.
 >
 >
 >
 > I wrote a very simple master/slave application where the slave does a
 simple
 > computation and gets the processor name.  The slave returns both
 pieces of
 > information to the master who then simply displays it in the terminal
 > window.  After the slaves work on 1024 such tasks, the master exists.
 >
 >
 >
 > When I run on the host, without distributing to the nodes, I use the
 > command:
 >
 >
 >
 > “mpirun –np 4 ./MPI_Example”
 >
 >
 >
 > Compiling and running the application on the native hardware works
 perfectly
 > (ie: compiled and run on the PS3 or compiled and run on the Dell).
 >
 >
 >
 > However, when I went to scatter the tasks to the nodes, using the
 following
 > command,
 >
 >
 >
 > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
 >
 >
 >
 > the application fails.  I’m surmising that the issue is with running
 code
 > that was compiled for the Dell on the PS3 since the MPI_Init will
 launch the
 > application from the shared volume.
 >
 >
 >
 > So, I took the source code and compiled it on both the Dell and the
 PS3 and
 > placed the executables in /shared_volume/Dell and /shared_volume/PS3
 and
 > added the paths to the environment variable PATH.  I tried to run the
 > application from the host again using the following command,
 >
 >
 >
 > “mpirun –np 4 –hostfile mpi-hostfile –wdir
 > /shared_volume/PS3 ./MPI_Example”
 >
 >
 >
 > Hoping that the wdir would set the working directory at the time of
 the call
 > to MPI_Init() so that MPI_Init will launch the PS3 version of the
 > 

Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-02-02 Thread Ralph Castain
Probably the easiest solution is to tell OMPI not to use the second NIC. For
example, if that NIC is eth1, then you could do this:

mpirun -mca oob_tcp_if_exclude eth1 -mca btl_tcp_if_exclude eth1 ...

This tells both the MPI layer and the RTE to ignore the eth1 interface.



On Tue, Feb 2, 2010 at 10:04 AM, Lee Manko  wrote:

> Thank you Jody and Ralph.  Your suggestions got me up and running (well
> sort of).  I have run into another issue that I was wondering if you have
> had any experience with.  My server has one NIC that is static and a second
> that is DHCP on a corp network (the only way to get to the outside world).
>  My scatter/gather process does not work when the second NIC is plugged in,
> but does work when unplugged.  It appears to have something to do with DHCP
> Discovery.
>
> Any suggestions?
>
> Lee Manko
>
>
>
> On Thu, Jan 28, 2010 at 11:53 AM, Lee Manko  wrote:
>
>> See, it was a simple thing.  Thank you for the information.  I am trying
>> it now.  Have to recompile and re-install openmpi for a heterogeneous
>> network.
>>
>> Now, knowing what to search for, I found that I can set the configuration
>> of the cluster in a file that mpirun and mpiexec can read.
>>
>> mpirun --app my_appfile
>>
>>
>> where app file contains the same --host information.  Makes customizing
>> the cluster for certain applications very easy.
>>
>> Thanks for the guidance to this MPI newbie.
>>
>> Lee
>>
>>
>>
>>
>> On Wed, Jan 27, 2010 at 11:43 PM, jody  wrote:
>>
>>> Hi
>>> I'm not sure i completely understood.
>>> Is it the case that an application compiled on the dell will not work
>>> on the PS3 and vice versa?
>>>
>>> If this is the case, you could try this:
>>>  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
>>> where app_ps3 is your application compiled on the PS3 and a is your PS3
>>> host,
>>> and app_dell is your application compiled on the dell, and b is your dell
>>> host.
>>>
>>> Check the MPI FAQs
>>>  http://www.open-mpi.org/faq/?category=running#mpmd-run
>>>  http://www.open-mpi.org/faq/?category=running#mpirun-host
>>>
>>> Hope this helps
>>>   Jody
>>>
>>> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
>>> > OK, so please stop me if you have heard this before, but I couldn’t
>>> find
>>> > anything in the archives that addressed my situation.
>>> >
>>> >
>>> >
>>> > I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
>>> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
>>> Fedora
>>> > Core 12.  After a failed attempt at letting yum install openmpi, I
>>> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
>>> > Dell).  I have an NSF shared directory on the host where the
>>> application
>>> > resides after building.  All nodes have access to the shared volume and
>>> they
>>> > can see any files in the shared volume.
>>> >
>>> >
>>> >
>>> > I wrote a very simple master/slave application where the slave does a
>>> simple
>>> > computation and gets the processor name.  The slave returns both pieces
>>> of
>>> > information to the master who then simply displays it in the terminal
>>> > window.  After the slaves work on 1024 such tasks, the master exists.
>>> >
>>> >
>>> >
>>> > When I run on the host, without distributing to the nodes, I use the
>>> > command:
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 ./MPI_Example”
>>> >
>>> >
>>> >
>>> > Compiling and running the application on the native hardware works
>>> perfectly
>>> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
>>> >
>>> >
>>> >
>>> > However, when I went to scatter the tasks to the nodes, using the
>>> following
>>> > command,
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>>> >
>>> >
>>> >
>>> > the application fails.  I’m surmising that the issue is with running
>>> code
>>> > that was compiled for the Dell on the PS3 since the MPI_Init will
>>> launch the
>>> > application from the shared volume.
>>> >
>>> >
>>> >
>>> > So, I took the source code and compiled it on both the Dell and the PS3
>>> and
>>> > placed the executables in /shared_volume/Dell and /shared_volume/PS3
>>> and
>>> > added the paths to the environment variable PATH.  I tried to run the
>>> > application from the host again using the following command,
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
>>> > /shared_volume/PS3 ./MPI_Example”
>>> >
>>> >
>>> >
>>> > Hoping that the wdir would set the working directory at the time of the
>>> call
>>> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
>>> > executable.
>>> >
>>> >
>>> >
>>> > I get the error:
>>> >
>>> > Could not execute the executable “./MPI_Example” : Exec format error
>>> >
>>> > This could mean that your PATH or executable name is wrong, or that you
>>> do
>>> > not
>>> >
>>> > have the necessary permissions.  

Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-02-02 Thread Lee Manko
Thank you Jody and Ralph.  Your suggestions got me up and running (well sort
of).  I have run into another issue that I was wondering if you have had any
experience with.  My server has one NIC that is static and a second that is
DHCP on a corp network (the only way to get to the outside world).  My
scatter/gather process does not work when the second NIC is plugged in, but
does work when unplugged.  It appears to have something to do with DHCP
Discovery.

Any suggestions?

Lee Manko



On Thu, Jan 28, 2010 at 11:53 AM, Lee Manko  wrote:

> See, it was a simple thing.  Thank you for the information.  I am trying it
> now.  Have to recompile and re-install openmpi for a heterogeneous network.
>
> Now, knowing what to search for, I found that I can set the configuration
> of the cluster in a file that mpirun and mpiexec can read.
>
> mpirun --app my_appfile
>
>
> where app file contains the same --host information.  Makes customizing the
> cluster for certain applications very easy.
>
> Thanks for the guidance to this MPI newbie.
>
> Lee
>
>
>
>
> On Wed, Jan 27, 2010 at 11:43 PM, jody  wrote:
>
>> Hi
>> I'm not sure i completely understood.
>> Is it the case that an application compiled on the dell will not work
>> on the PS3 and vice versa?
>>
>> If this is the case, you could try this:
>>  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
>> where app_ps3 is your application compiled on the PS3 and a is your PS3
>> host,
>> and app_dell is your application compiled on the dell, and b is your dell
>> host.
>>
>> Check the MPI FAQs
>>  http://www.open-mpi.org/faq/?category=running#mpmd-run
>>  http://www.open-mpi.org/faq/?category=running#mpirun-host
>>
>> Hope this helps
>>   Jody
>>
>> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
>> > OK, so please stop me if you have heard this before, but I couldn’t find
>> > anything in the archives that addressed my situation.
>> >
>> >
>> >
>> > I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
>> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
>> Fedora
>> > Core 12.  After a failed attempt at letting yum install openmpi, I
>> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
>> > Dell).  I have an NSF shared directory on the host where the application
>> > resides after building.  All nodes have access to the shared volume and
>> they
>> > can see any files in the shared volume.
>> >
>> >
>> >
>> > I wrote a very simple master/slave application where the slave does a
>> simple
>> > computation and gets the processor name.  The slave returns both pieces
>> of
>> > information to the master who then simply displays it in the terminal
>> > window.  After the slaves work on 1024 such tasks, the master exists.
>> >
>> >
>> >
>> > When I run on the host, without distributing to the nodes, I use the
>> > command:
>> >
>> >
>> >
>> > “mpirun –np 4 ./MPI_Example”
>> >
>> >
>> >
>> > Compiling and running the application on the native hardware works
>> perfectly
>> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
>> >
>> >
>> >
>> > However, when I went to scatter the tasks to the nodes, using the
>> following
>> > command,
>> >
>> >
>> >
>> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>> >
>> >
>> >
>> > the application fails.  I’m surmising that the issue is with running
>> code
>> > that was compiled for the Dell on the PS3 since the MPI_Init will launch
>> the
>> > application from the shared volume.
>> >
>> >
>> >
>> > So, I took the source code and compiled it on both the Dell and the PS3
>> and
>> > placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
>> > added the paths to the environment variable PATH.  I tried to run the
>> > application from the host again using the following command,
>> >
>> >
>> >
>> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
>> > /shared_volume/PS3 ./MPI_Example”
>> >
>> >
>> >
>> > Hoping that the wdir would set the working directory at the time of the
>> call
>> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
>> > executable.
>> >
>> >
>> >
>> > I get the error:
>> >
>> > Could not execute the executable “./MPI_Example” : Exec format error
>> >
>> > This could mean that your PATH or executable name is wrong, or that you
>> do
>> > not
>> >
>> > have the necessary permissions.  Please ensure that the executable is
>> able
>> > to be
>> >
>> > found and executed.
>> >
>> >
>> >
>> > Now, I know I’m gonna get some heat for this, but all of these machine
>> use
>> > only the root account with full root privileges, so it’s not a
>> permission
>> > issue.
>> >
>> >
>> >
>> >
>> >
>> > I am sure there is simple solution to my problem.  Replacing the host
>> with a
>> > PS3 is not an option. Does anyone have any suggestions?
>> >
>> >
>> >
>> > Thanks.
>> >
>> >
>> >
>> > PS: When I get to 

Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-01-28 Thread Lee Manko
See, it was a simple thing.  Thank you for the information.  I am trying it
now.  Have to recompile and re-install openmpi for a heterogeneous network.

Now, knowing what to search for, I found that I can set the configuration of
the cluster in a file that mpirun and mpiexec can read.

mpirun --app my_appfile


where app file contains the same --host information.  Makes customizing the
cluster for certain applications very easy.

Thanks for the guidance to this MPI newbie.

Lee



On Wed, Jan 27, 2010 at 11:43 PM, jody  wrote:

> Hi
> I'm not sure i completely understood.
> Is it the case that an application compiled on the dell will not work
> on the PS3 and vice versa?
>
> If this is the case, you could try this:
>  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
> where app_ps3 is your application compiled on the PS3 and a is your PS3
> host,
> and app_dell is your application compiled on the dell, and b is your dell
> host.
>
> Check the MPI FAQs
>  http://www.open-mpi.org/faq/?category=running#mpmd-run
>  http://www.open-mpi.org/faq/?category=running#mpirun-host
>
> Hope this helps
>   Jody
>
> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
> > OK, so please stop me if you have heard this before, but I couldn’t find
> > anything in the archives that addressed my situation.
> >
> >
> >
> > I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
> Fedora
> > Core 12.  After a failed attempt at letting yum install openmpi, I
> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
> > Dell).  I have an NSF shared directory on the host where the application
> > resides after building.  All nodes have access to the shared volume and
> they
> > can see any files in the shared volume.
> >
> >
> >
> > I wrote a very simple master/slave application where the slave does a
> simple
> > computation and gets the processor name.  The slave returns both pieces
> of
> > information to the master who then simply displays it in the terminal
> > window.  After the slaves work on 1024 such tasks, the master exists.
> >
> >
> >
> > When I run on the host, without distributing to the nodes, I use the
> > command:
> >
> >
> >
> > “mpirun –np 4 ./MPI_Example”
> >
> >
> >
> > Compiling and running the application on the native hardware works
> perfectly
> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
> >
> >
> >
> > However, when I went to scatter the tasks to the nodes, using the
> following
> > command,
> >
> >
> >
> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
> >
> >
> >
> > the application fails.  I’m surmising that the issue is with running code
> > that was compiled for the Dell on the PS3 since the MPI_Init will launch
> the
> > application from the shared volume.
> >
> >
> >
> > So, I took the source code and compiled it on both the Dell and the PS3
> and
> > placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
> > added the paths to the environment variable PATH.  I tried to run the
> > application from the host again using the following command,
> >
> >
> >
> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
> > /shared_volume/PS3 ./MPI_Example”
> >
> >
> >
> > Hoping that the wdir would set the working directory at the time of the
> call
> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
> > executable.
> >
> >
> >
> > I get the error:
> >
> > Could not execute the executable “./MPI_Example” : Exec format error
> >
> > This could mean that your PATH or executable name is wrong, or that you
> do
> > not
> >
> > have the necessary permissions.  Please ensure that the executable is
> able
> > to be
> >
> > found and executed.
> >
> >
> >
> > Now, I know I’m gonna get some heat for this, but all of these machine
> use
> > only the root account with full root privileges, so it’s not a permission
> > issue.
> >
> >
> >
> >
> >
> > I am sure there is simple solution to my problem.  Replacing the host
> with a
> > PS3 is not an option. Does anyone have any suggestions?
> >
> >
> >
> > Thanks.
> >
> >
> >
> > PS: When I get to programming the Cell BE, then I’ll use the IBM Cell SDK
> > with its cross-compiler toolchain.
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-01-28 Thread Ralph Castain
Also, did you remember to configure with --enable-heterogeneity?

On Jan 28, 2010, at 12:43 AM, jody wrote:

> Hi
> I'm not sure i completely understood.
> Is it the case that an application compiled on the dell will not work
> on the PS3 and vice versa?
> 
> If this is the case, you could try this:
>  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
> where app_ps3 is your application compiled on the PS3 and a is your PS3 host,
> and app_dell is your application compiled on the dell, and b is your dell 
> host.
> 
> Check the MPI FAQs
>  http://www.open-mpi.org/faq/?category=running#mpmd-run
>  http://www.open-mpi.org/faq/?category=running#mpirun-host
> 
> Hope this helps
>  Jody
> 
> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
>> OK, so please stop me if you have heard this before, but I couldn’t find
>> anything in the archives that addressed my situation.
>> 
>> 
>> 
>> I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
>> Linux 6.2 and a host (server) that is a Dell i686 Quad-core running Fedora
>> Core 12.  After a failed attempt at letting yum install openmpi, I
>> downloaded v1.4.1, compiled and installed on all machines (PS3s and
>> Dell).  I have an NSF shared directory on the host where the application
>> resides after building.  All nodes have access to the shared volume and they
>> can see any files in the shared volume.
>> 
>> 
>> 
>> I wrote a very simple master/slave application where the slave does a simple
>> computation and gets the processor name.  The slave returns both pieces of
>> information to the master who then simply displays it in the terminal
>> window.  After the slaves work on 1024 such tasks, the master exists.
>> 
>> 
>> 
>> When I run on the host, without distributing to the nodes, I use the
>> command:
>> 
>> 
>> 
>> “mpirun –np 4 ./MPI_Example”
>> 
>> 
>> 
>> Compiling and running the application on the native hardware works perfectly
>> (ie: compiled and run on the PS3 or compiled and run on the Dell).
>> 
>> 
>> 
>> However, when I went to scatter the tasks to the nodes, using the following
>> command,
>> 
>> 
>> 
>> “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>> 
>> 
>> 
>> the application fails.  I’m surmising that the issue is with running code
>> that was compiled for the Dell on the PS3 since the MPI_Init will launch the
>> application from the shared volume.
>> 
>> 
>> 
>> So, I took the source code and compiled it on both the Dell and the PS3 and
>> placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
>> added the paths to the environment variable PATH.  I tried to run the
>> application from the host again using the following command,
>> 
>> 
>> 
>> “mpirun –np 4 –hostfile mpi-hostfile –wdir
>> /shared_volume/PS3 ./MPI_Example”
>> 
>> 
>> 
>> Hoping that the wdir would set the working directory at the time of the call
>> to MPI_Init() so that MPI_Init will launch the PS3 version of the
>> executable.
>> 
>> 
>> 
>> I get the error:
>> 
>> Could not execute the executable “./MPI_Example” : Exec format error
>> 
>> This could mean that your PATH or executable name is wrong, or that you do
>> not
>> 
>> have the necessary permissions.  Please ensure that the executable is able
>> to be
>> 
>> found and executed.
>> 
>> 
>> 
>> Now, I know I’m gonna get some heat for this, but all of these machine use
>> only the root account with full root privileges, so it’s not a permission
>> issue.
>> 
>> 
>> 
>> 
>> 
>> I am sure there is simple solution to my problem.  Replacing the host with a
>> PS3 is not an option. Does anyone have any suggestions?
>> 
>> 
>> 
>> Thanks.
>> 
>> 
>> 
>> PS: When I get to programming the Cell BE, then I’ll use the IBM Cell SDK
>> with its cross-compiler toolchain.
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Non-homogeneous Cluster Implementation

2010-01-28 Thread jody
Hi
I'm not sure i completely understood.
Is it the case that an application compiled on the dell will not work
on the PS3 and vice versa?

If this is the case, you could try this:
  shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
where app_ps3 is your application compiled on the PS3 and a is your PS3 host,
and app_dell is your application compiled on the dell, and b is your dell host.

Check the MPI FAQs
  http://www.open-mpi.org/faq/?category=running#mpmd-run
  http://www.open-mpi.org/faq/?category=running#mpirun-host

Hope this helps
  Jody

On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko  wrote:
> OK, so please stop me if you have heard this before, but I couldn’t find
> anything in the archives that addressed my situation.
>
>
>
> I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
> Linux 6.2 and a host (server) that is a Dell i686 Quad-core running Fedora
> Core 12.  After a failed attempt at letting yum install openmpi, I
> downloaded v1.4.1, compiled and installed on all machines (PS3s and
> Dell).  I have an NSF shared directory on the host where the application
> resides after building.  All nodes have access to the shared volume and they
> can see any files in the shared volume.
>
>
>
> I wrote a very simple master/slave application where the slave does a simple
> computation and gets the processor name.  The slave returns both pieces of
> information to the master who then simply displays it in the terminal
> window.  After the slaves work on 1024 such tasks, the master exists.
>
>
>
> When I run on the host, without distributing to the nodes, I use the
> command:
>
>
>
> “mpirun –np 4 ./MPI_Example”
>
>
>
> Compiling and running the application on the native hardware works perfectly
> (ie: compiled and run on the PS3 or compiled and run on the Dell).
>
>
>
> However, when I went to scatter the tasks to the nodes, using the following
> command,
>
>
>
> “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>
>
>
> the application fails.  I’m surmising that the issue is with running code
> that was compiled for the Dell on the PS3 since the MPI_Init will launch the
> application from the shared volume.
>
>
>
> So, I took the source code and compiled it on both the Dell and the PS3 and
> placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
> added the paths to the environment variable PATH.  I tried to run the
> application from the host again using the following command,
>
>
>
> “mpirun –np 4 –hostfile mpi-hostfile –wdir
> /shared_volume/PS3 ./MPI_Example”
>
>
>
> Hoping that the wdir would set the working directory at the time of the call
> to MPI_Init() so that MPI_Init will launch the PS3 version of the
> executable.
>
>
>
> I get the error:
>
> Could not execute the executable “./MPI_Example” : Exec format error
>
> This could mean that your PATH or executable name is wrong, or that you do
> not
>
> have the necessary permissions.  Please ensure that the executable is able
> to be
>
> found and executed.
>
>
>
> Now, I know I’m gonna get some heat for this, but all of these machine use
> only the root account with full root privileges, so it’s not a permission
> issue.
>
>
>
>
>
> I am sure there is simple solution to my problem.  Replacing the host with a
> PS3 is not an option. Does anyone have any suggestions?
>
>
>
> Thanks.
>
>
>
> PS: When I get to programming the Cell BE, then I’ll use the IBM Cell SDK
> with its cross-compiler toolchain.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



[OMPI users] Non-homogeneous Cluster Implementation

2010-01-27 Thread Lee Manko
OK, so please stop me if you have heard this before, but I couldn’t find
anything in the archives that addressed my situation.



I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
Linux 6.2 and a host (server) that is a Dell i686 Quad-core running Fedora
Core 12.  After a failed attempt at letting yum install openmpi, I
downloaded v1.4.1, compiled and installed on all machines (PS3s and Dell).  I
have an NSF shared directory on the host where the application resides after
building.  All nodes have access to the shared volume and they can see any
files in the shared volume.



I wrote a very simple master/slave application where the slave does a simple
computation and gets the processor name.  The slave returns both pieces of
information to the master who then simply displays it in the terminal
window.  After the slaves work on 1024 such tasks, the master exists.



When I run on the host, without distributing to the nodes, I use the
command:



“mpirun –np 4 ./MPI_Example”



Compiling and running the application on the native hardware works perfectly
(ie: compiled and run on the PS3 or compiled and run on the Dell).



However, when I went to scatter the tasks to the nodes, using the following
command,



“mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”



the application fails.  I’m surmising that the issue is with running code
that was compiled for the Dell on the PS3 since the MPI_Init will launch the
application from the shared volume.



So, I took the source code and compiled it on both the Dell and the PS3 and
placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
added the paths to the environment variable PATH.  I tried to run the
application from the host again using the following command,



“mpirun –np 4 –hostfile mpi-hostfile –wdir
/shared_volume/PS3 ./MPI_Example”



Hoping that the wdir would set the working directory at the time of the call
to MPI_Init() so that MPI_Init will launch the PS3 version of the
executable.



I get the error:

Could not execute the executable “./MPI_Example” : Exec format error

This could mean that your PATH or executable name is wrong, or that you do
not

have the necessary permissions.  Please ensure that the executable is able
to be

found and executed.



Now, I know I’m gonna get some heat for this, but all of these machine use
only the root account with full root privileges, so it’s not a permission
issue.





I am sure there is simple solution to my problem.  Replacing the host with a
PS3 is not an option. Does anyone have any suggestions?



Thanks.



PS: When I get to programming the Cell BE, then I’ll use the IBM Cell SDK
with its cross-compiler toolchain.