Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Ethan Deneault

Prentice Bisbal wrote:

Ashley Pittman wrote:

This smacks of a firewall issue, I thought you'd said you weren't using one but 
now I read back your emails I can't see anywhere where you say that.  Are you 
running a flrewall or any iptables rules on any of the nodes?  It looks to me 
like you may have some setup from on the worker nodes.

Ashley.



I agree with Ashley. To make sure it's not an IP tables or SELinux
problem on one of the nodes, run these two commands on all teh nodes and
then try again:

service iptables stop
setenforce 0




This fix worked. Delving in deeper, it turns out that there was a typo in the iptables file for the 
nodes: they were accepting all traffic on eth1 instead of eth0. Only the master has an eth1 port. 
When I checked the tables earlier, I didn't notice the discrepancy.


Thank you all so much!

Cheers,
Ethan



--
Dr. Ethan Deneault
Assistant Professor of Physics
SC-234
University of Tampa
Tampa, FL 33615
Office: (813) 257-3555


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Prentice Bisbal
Ashley Pittman wrote:
> This smacks of a firewall issue, I thought you'd said you weren't using one 
> but now I read back your emails I can't see anywhere where you say that.  Are 
> you running a flrewall or any iptables rules on any of the nodes?  It looks 
> to me like you may have some setup from on the worker nodes.
> 
> Ashley.
> 

I agree with Ashley. To make sure it's not an IP tables or SELinux
problem on one of the nodes, run these two commands on all teh nodes and
then try again:

service iptables stop
setenforce 0


-- 
Prentice


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Ashley Pittman

This smacks of a firewall issue, I thought you'd said you weren't using one but 
now I read back your emails I can't see anywhere where you say that.  Are you 
running a flrewall or any iptables rules on any of the nodes?  It looks to me 
like you may have some setup from on the worker nodes.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Ethan Deneault

Rolf vandeVaart wrote:

Ethan:

Can you run just "hostname" successfully?  In other words, a non-MPI 
program.
If that does not work, then we know the problem is in the runtime.  If  
it does works, then
there is something with the way the MPI library is setting up its 
connections.


Interesting. I did not try this.

From the master:
$ mpirun -debug-daemons -host merope,asterope -np 2 hostname
asterope
merope

$ mpirun -host merope,asterope,electra -np 3 hostname
asterope
merope

(hangs)

$ mpirun -host electra,asterope,merope -np 3 hostname
asterope
electra

(hangs)

I cannot get 3 nodes to work together. Each node does work if in a pair of two. I can get three 
-processes- to work, if I include the master:


$ mpirun -host pleiades,electra,asterope -np 3 hostname
pleiades
electra
asterope

But 4 processes does not:

$ mpirun -host pleiades,electra,asterope,merope -np 4 hostname
pleiades
electra
asterope

(hangs)


Is there more than one interface on the nodes?


Each node only has eth0, and a static DHCP address.

Is there something in the way that I have the nodes set up? They boot via PXE from an image on the 
master, so they should all have the same basic filesystem.


Cheers,
Ethan









Rolf

On 09/21/10 14:41, Ethan Deneault wrote:

Prentice Bisbal wrote:



I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)


Yes. I am able to log in remotely to all nodes from the master, and to 
each node from each node without a password. Each node mounts the same 
/home directory from the master, so they have the same copy of all the 
ssh and rsh keys.



This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

> 3. Change your machinefile to include 4 completely different hosts.

This does not seem to have any beneficial effect.

The test program run from the master (pleiades) with any combination 
of 3 other nodes hangs during communication. This includes not using 
--machinefile and using -host; i.e.


$ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
$ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
$ mpirun -host merope,electra -np 3 ./test.out
 node   1 : Hello world
 node   0 : Hello world
 node   2 : Hello world


2. Run the mpirun command from a different host. I'd try running it from
several different hosts.


The mpirun command does not seem to work when launched from one of the 
nodes. As an example:


Running on node asterope:

asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out

Daemon was launched on atlas - beginning to initialize
Daemon was launched on electra - beginning to initialize
Daemon [[54956,0],1] checking in as pid 2716 on host atlas
Daemon [[54956,0],1] not using static ports
Daemon [[54956,0],2] checking in as pid 2741 on host electra
Daemon [[54956,0],2] not using static ports

(hangs)


I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.


The machine file is a simple list of hostnames, as an example:

m43
taygeta
asterope



Cheers,
Ethan



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dr. Ethan Deneault
Assistant Professor of Physics
SC-234
University of Tampa
Tampa, FL 33615
Office: (813) 257-3555


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Rolf vandeVaart

Ethan:

Can you run just "hostname" successfully?  In other words, a non-MPI 
program.
If that does not work, then we know the problem is in the runtime.  If  
it does works, then
there is something with the way the MPI library is setting up its 
connections.


Is there more than one interface on the nodes?

Rolf

On 09/21/10 14:41, Ethan Deneault wrote:

Prentice Bisbal wrote:



I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)


Yes. I am able to log in remotely to all nodes from the master, and to 
each node from each node without a password. Each node mounts the same 
/home directory from the master, so they have the same copy of all the 
ssh and rsh keys.



This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

> 3. Change your machinefile to include 4 completely different hosts.

This does not seem to have any beneficial effect.

The test program run from the master (pleiades) with any combination 
of 3 other nodes hangs during communication. This includes not using 
--machinefile and using -host; i.e.


$ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
$ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
$ mpirun -host merope,electra -np 3 ./test.out
 node   1 : Hello world
 node   0 : Hello world
 node   2 : Hello world


2. Run the mpirun command from a different host. I'd try running it from
several different hosts.


The mpirun command does not seem to work when launched from one of the 
nodes. As an example:


Running on node asterope:

asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out

Daemon was launched on atlas - beginning to initialize
Daemon was launched on electra - beginning to initialize
Daemon [[54956,0],1] checking in as pid 2716 on host atlas
Daemon [[54956,0],1] not using static ports
Daemon [[54956,0],2] checking in as pid 2741 on host electra
Daemon [[54956,0],2] not using static ports

(hangs)


I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.


The machine file is a simple list of hostnames, as an example:

m43
taygeta
asterope



Cheers,
Ethan





Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Ethan Deneault

Prentice Bisbal wrote:



I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)


Yes. I am able to log in remotely to all nodes from the master, and to each node from each node 
without a password. Each node mounts the same /home directory from the master, so they have the same 
copy of all the ssh and rsh keys.



This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

> 3. Change your machinefile to include 4 completely different hosts.

This does not seem to have any beneficial effect.

The test program run from the master (pleiades) with any combination of 3 other nodes hangs during 
communication. This includes not using --machinefile and using -host; i.e.


$ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs)
$ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs)
$ mpirun -host merope,electra -np 3 ./test.out
 node   1 : Hello world
 node   0 : Hello world
 node   2 : Hello world


2. Run the mpirun command from a different host. I'd try running it from
several different hosts.


The mpirun command does not seem to work when launched from one of the nodes. 
As an example:

Running on node asterope:

asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out

Daemon was launched on atlas - beginning to initialize
Daemon was launched on electra - beginning to initialize
Daemon [[54956,0],1] checking in as pid 2716 on host atlas
Daemon [[54956,0],1] not using static ports
Daemon [[54956,0],2] checking in as pid 2741 on host electra
Daemon [[54956,0],2] not using static ports

(hangs)


I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.


The machine file is a simple list of hostnames, as an example:

m43
taygeta
asterope



Cheers,
Ethan

--
Dr. Ethan Deneault
Assistant Professor of Physics
SC-234
University of Tampa
Tampa, FL 33615
Office: (813) 257-3555


[OMPI users] multipath support for infiniband

2010-09-21 Thread Jens Domke

Hello,

the InfiniBand architecture has a LMC feature to assign mutiple virtual 
LIDs to one port and so provides multiple paths between two ports. Is 
there a methode in openmpi to enable message-striping over these paths 
to increase bandwidth or avoid congestions?
(I don't mean the multirail feature, to split traffic across two ports 
of one Hca)
The only function I have found, was to enable automatic path migration 
over lmc, but this is only for failover, if I remember rightly.


Regards,
Jens


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Gus Correa


Prentice Bisbal wrote:

Ethan Deneault wrote:

All,

I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the
/usr/lib/openmpi/1.4-gcc/ directory. I know this is typically
/opt/openmpi, but Red Hat does things differently. I have my PATH and
LD_LIBRARY_PATH set correctly; because the test program does compile and
run.

The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is
a AMD x86_64 machine which serves the diskless node images and /home as
an NFS mount. I compile all of my programs as 32-bit.

My code is a simple hello world:
$ more test.f
  program test

  include 'mpif.h'
  integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)

  call MPI_INIT(ierror)
  call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
  call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
  print*, 'node', rank, ': Hello world'
  call MPI_FINALIZE(ierror)
  end

If I run this program with:

$ mpirun --machinefile testfile ./test.out
 node   0 : Hello world
 node   2 : Hello world
 node   1 : Hello world

This is the expected output. Here, testfile contains the master node:
'pleiades', and two slave nodes: 'taygeta' and 'm43'

If I add another machine to testfile, say 'asterope', it hangs until I
ctrl-c it. I have tried every machine, and as long as I do not include
more than 3 hosts, the program will not hang.

I have run the debug-daemons flag with it as well, and I don't see what
is wrong specifically.



I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)

This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

2. Run the mpirun command from a different host. I'd try running it from
several different hosts.

3. Change your machinefile to include 4 completely different hosts.

I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.



Hi Ethan

What your program prints is process number, not the host name.
To make sure all nodes are responding, you can try this:

http://www.open-mpi.org/faq/?category=running#mpirun-host

For the hostfile/machinefile structure,
including the number of slots/cores/processors, see "man mpiexec".

The OpenMPI FAQ have answers for many of these initial setup questions.
Worth taking a look.

I hope it helps,
Gus Correa



Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread Prentice Bisbal
Ethan Deneault wrote:
> All,
> 
> I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the
> /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically
> /opt/openmpi, but Red Hat does things differently. I have my PATH and
> LD_LIBRARY_PATH set correctly; because the test program does compile and
> run.
> 
> The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is
> a AMD x86_64 machine which serves the diskless node images and /home as
> an NFS mount. I compile all of my programs as 32-bit.
> 
> My code is a simple hello world:
> $ more test.f
>   program test
> 
>   include 'mpif.h'
>   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
> 
>   call MPI_INIT(ierror)
>   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>   print*, 'node', rank, ': Hello world'
>   call MPI_FINALIZE(ierror)
>   end
> 
> If I run this program with:
> 
> $ mpirun --machinefile testfile ./test.out
>  node   0 : Hello world
>  node   2 : Hello world
>  node   1 : Hello world
> 
> This is the expected output. Here, testfile contains the master node:
> 'pleiades', and two slave nodes: 'taygeta' and 'm43'
> 
> If I add another machine to testfile, say 'asterope', it hangs until I
> ctrl-c it. I have tried every machine, and as long as I do not include
> more than 3 hosts, the program will not hang.
> 
> I have run the debug-daemons flag with it as well, and I don't see what
> is wrong specifically.
> 

I'm assuming you already tested ssh connectivity and verified everything
is working as it should. (You did test all that, right?)

This sounds like configuration problem on one of the nodes, or a problem
with ssh. I suspect it's not a problem with the number of processes, but
  whichever node is the 4th in your machinefile has a connectivity or
configuration issue:

I would try the following:

1. reorder the list of hosts in your machine file.

2. Run the mpirun command from a different host. I'd try running it from
several different hosts.

3. Change your machinefile to include 4 completely different hosts.

I think someone else recommended that you should be specifying the
number of process with -np. I second that.

If the above fails, you might want to post your machine file your using.

-- 
Prentice


[OMPI users] PathScale problems persist

2010-09-21 Thread Rafael Arco Arredondo
Hello,

In January, I reported a problem with Open MPI 1.4.1 and PathScale 3.2
about a simple Hello World that hung on initialization
( http://www.open-mpi.org/community/lists/users/2010/01/11863.php ).
Open MPI 1.4.2 does not show this problem.

However, now we are having trouble with the 1.4.2, PathScale 3.2, and
the C++ bindings. The following code:

#include 
#include 

int main(int argc, char* argv[]) {
  int node, size;

  MPI::Init(argc, argv);
  MPI::COMM_WORLD.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);

  try {
int rank = MPI::COMM_WORLD.Get_rank();
int size = MPI::COMM_WORLD.Get_size();

std::cout << "Hello world from process " << rank << " out of "
  << size << "!" << std::endl;
  }

  catch(MPI::Exception e) {
std::cerr << "MPI Error: " << e.Get_error_code()
  << " - " << e.Get_error_string() << std::endl;
  }

  MPI::Finalize();
  return 0;
}

generates the following output:

[host1:29934] *** An error occurred in MPI_Comm_set_errhandler
[host1:29934] *** on communicator MPI_COMM_WORLD
[host1:29934] *** MPI_ERR_COMM: invalid communicator
[host1:29934] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--
mpirun has exited due to process rank 2 with PID 29934 on
node host1 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[host1:29931] 3 more processes have sent help message
help-mpi-errors.txt / mpi_errors_are_fatal
[host1:29931] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

There are no problems when Open MPI 1.4.2 is built with GCC (GCC 4.1.2).
No problems are found with Open MPI 1.2.6 and PathScale either.

Best regards,

Rafa

-- 
Rafael Arco Arredondo
Centro de Servicios de Informática y Redes de Comunicaciones
Campus de Fuentenueva - Edificio Mecenas
Universidad de Granada




Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Ashley Pittman

On 21 Sep 2010, at 09:54, Mikael Lavoie wrote:

> Hi,
> 
> Sorry, but i get lost in what i wanna do, i have build a small home cluster 
> with Pelican_HPC, that user openMPI, and i was trying to find a way to get a 
> multithreaded program work in a multiprocess way without taking the time to 
> learn MPI. And my vision was a sort of wrapper that take C posix app src 
> code, and convert it from pthread to a multiprocessMPI app. But the problem 
> is the remote memory access, that will only be implemented in MPI 3.0(for 
> what i've read of it).
> 
> So, after 12 hour of intensive reading about MPI and POSIX, the best way to 
> deal with my problem(running a C pthreaded app in my cluster) is to convert 
> the src in a SPMD way.
> I didn't mentionned that basicly, my prog open huge text file, take each 
> string and process it through lot's of cryptographic iteration and then save 
> the result in an output.out like file.
> So i will need to make the master process split the input file and then send 
> them as input for the worker process.
> 
> But if you or someone else know a kind of interpretor like program to run a 
> multithreaded C program and convert it logically to a master/worker 
> multiprocess MPI that will be sended by ssh to the interpreter on the worker 
> side and then lunched.
> 
> This is what i've tried to explain in the last msg. A dream for the hobyist 
> that want to get the full power of a night-time cluster, without having to 
> learn all the MPI syntax and structure.
> 
> If it doesn't exist, this would be a really great tool i think.
> 
> Thank you for your reply, but i think i have answered my question alone... No 
> Pain, No Gain...

What you are thinking of is I believe something more like ScaleMP or Mosix, 
neither of which I have first-hand experience of.  It's a hard problem to solve 
and I don't believe there is any general solution available.

It sounds like your application would be a fairly easy conversion to MPI but to 
do that you will need to re-code areas of your application, it almost sounds 
like you could get away with just using MPI_Init, MPI_Scatter and MPI_Gather.  
Typically you would use the head-node to launch the job but not do any 
computation, rank 0 in the job would then do the marshalling of data and all 
ranks would be started simultaneously, you'll find this easier than having one 
single-rank job spawn more ranks as required.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Reuti
Hi,

Am 21.09.2010 um 10:54 schrieb Mikael Lavoie:

> Sorry, but i get lost in what i wanna do, i have build a small home cluster 
> with Pelican_HPC, that user openMPI, and i was trying to find a way to get a 
> multithreaded program work in a multiprocess way without taking the time to 
> learn MPI. And my vision was a sort of wrapper that take C posix app src 
> code, and convert it from pthread to a multiprocessMPI app. But the problem 
> is the remote memory access, that will only be implemented in MPI 3.0(for 
> what i've read of it).
> 
> So, after 12 hour of intensive reading about MPI and POSIX, the best way to 
> deal with my problem(running a C pthreaded app in my cluster) is to convert 
> the src in a SPMD way.
> I didn't mentionned that basicly, my prog open huge text file, take each 
> string and process it through lot's of cryptographic iteration and then save 
> the result in an output.out like file.
> So i will need to make the master process split the input file and then send 
> them as input for the worker process.
> 
> But if you or someone else know a kind of interpretor like program to run a 
> multithreaded C program and convert it logically to a master/worker 
> multiprocess MPI that will be sended by ssh to the interpreter on the worker 
> side and then lunched.

what about taking a step back and use PVM? Of course you have no shared memory 
access, but it will allow you to transfer information between nodes and start 
worker processes:

http://www.netlib.org/pvm3/book/node17.html

Looks like PVM is no longer included in Pelican_HPC by default, but you can 
compile it on your own.

-- Reuti


> This is what i've tried to explain in the last msg. A dream for the hobyist 
> that want to get the full power of a night-time cluster, without having to 
> learn all the MPI syntax and structure.
> 
> If it doesn't exist, this would be a really great tool i think.
> 
> Thank you for your reply, but i think i have answered my question alone... No 
> Pain, No Gain...
> 
> On 20 Sep 2010, at 22:24, Mikael Lavoie wrote:
> > I wanna know if it exist a implementation that permit to run a single host 
> > process on the master of the cluster, that will then spawn 1 process per 
> > -np X defined thread at the host specified in the host list. The host will 
> > then act as a syncronized sender/collecter of the work done.
> 
> I don't fully understand you explanation either but I may be able to help 
> clear up what you are asking for:
> 
> If you mean "pthreads" or "linux threads" then no, you cannot have different 
> threads on different nodes under any programming paradigm.
> 
> However if you mean "execution threads" or in MPI parlance "ranks" then yes, 
> under OpenMPI each "rank" will be a separate process on one of the nodes in 
> the host list, as Jody says look at MPI_Comm_Spawn for this.
> 
> Ashley,
> 
> --
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Reuti
Am 21.09.2010 um 10:19 schrieb Ashley Pittman:

> On 20 Sep 2010, at 22:24, Mikael Lavoie wrote:
>> I wanna know if it exist a implementation that permit to run a single host 
>> process on the master of the cluster, that will then spawn 1 process per -np 
>> X defined thread at the host specified in the host list. The host will then 
>> act as a syncronized sender/collecter of the work done.
> 
> I don't fully understand you explanation either but I may be able to help 
> clear up what you are asking for:
> 
> If you mean "pthreads" or "linux threads" then no, you cannot have different 
> threads on different nodes under any programming paradigm.

There are some efforts like http://www.kerrighed.org/wiki/index.php/Main_Page, 
but for the current release the thread migration is indeed disabled.

-- Reuti


> However if you mean "execution threads" or in MPI parlance "ranks" then yes, 
> under OpenMPI each "rank" will be a separate process on one of the nodes in 
> the host list, as Jody says look at MPI_Comm_Spawn for this.
> 
> Ashley,
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Mikael Lavoie
Hi,

Sorry, but i get lost in what i wanna do, i have build a small home cluster
with Pelican_HPC, that user openMPI, and i was trying to find a way to get a
multithreaded program work in a multiprocess way without taking the time to
learn MPI. And my vision was a sort of wrapper that take C posix app src
code, and convert it from pthread to a multiprocessMPI app. But the problem
is the remote memory access, that will only be implemented in MPI 3.0(for
what i've read of it).

So, after 12 hour of intensive reading about MPI and POSIX, the best way to
deal with my problem(running a C pthreaded app in my cluster) is to convert
the src in a SPMD way.
I didn't mentionned that basicly, my prog open huge text file, take each
string and process it through lot's of cryptographic iteration and then save
the result in an output.out like file.
So i will need to make the master process split the input file and then send
them as input for the worker process.

But if you or someone else know a kind of interpretor like program to run a
multithreaded C program and convert it logically to a master/worker
multiprocess MPI that will be sended by ssh to the interpreter on the worker
side and then lunched.

This is what i've tried to explain in the last msg. A dream for the hobyist
that want to get the full power of a night-time cluster, without having to
learn all the MPI syntax and structure.

If it doesn't exist, this would be a really great tool i think.

Thank you for your reply, but i think i have answered my question alone...
No Pain, No Gain...

>
> On 20 Sep 2010, at 22:24, Mikael Lavoie wrote:
> > I wanna know if it exist a implementation that permit to run a single
> host process on the master of the cluster, that will then spawn 1 process
> per -np X defined thread at the host specified in the host list. The host
> will then act as a syncronized sender/collecter of the work done.
>
> I don't fully understand you explanation either but I may be able to help
> clear up what you are asking for:
>
> If you mean "pthreads" or "linux threads" then no, you cannot have
> different threads on different nodes under any programming paradigm.
>
> However if you mean "execution threads" or in MPI parlance "ranks" then
> yes, under OpenMPI each "rank" will be a separate process on one of the
> nodes in the host list, as Jody says look at MPI_Comm_Spawn for this.
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Thread as MPI process

2010-09-21 Thread Ashley Pittman

On 20 Sep 2010, at 22:24, Mikael Lavoie wrote:
> I wanna know if it exist a implementation that permit to run a single host 
> process on the master of the cluster, that will then spawn 1 process per -np 
> X defined thread at the host specified in the host list. The host will then 
> act as a syncronized sender/collecter of the work done.

I don't fully understand you explanation either but I may be able to help clear 
up what you are asking for:

If you mean "pthreads" or "linux threads" then no, you cannot have different 
threads on different nodes under any programming paradigm.

However if you mean "execution threads" or in MPI parlance "ranks" then yes, 
under OpenMPI each "rank" will be a separate process on one of the nodes in the 
host list, as Jody says look at MPI_Comm_Spawn for this.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] Thread as MPI process

2010-09-21 Thread jody
Hi
I don't know if i correctly understand what you need, but have you
already tried  MPI_Comm_spawn?

Jody

On Mon, Sep 20, 2010 at 11:24 PM, Mikael Lavoie  wrote:
> Hi,
>
> I wanna know if it exist a implementation that permit to run a single host
> process on the master of the cluster, that will then spawn 1 process per -np
> X defined thread at the host specified in the host list. The host will then
> act as a syncronized sender/collecter of the work done.
>
> It would really be the saint-graal of the MPI implementation to me, for the
> use i wanna make of it.
>
> So i wait your answer, hoping that this exist,
>
> Mikael Lavoie
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.

2010-09-21 Thread ETHAN DENEAULT
David, 

I did try that after I sent the original mail, but the -np 4 flag doesn't fix 
the problem, the program still hangs. I've also double checked the iptables for 
the image and for the master node, and all ports are set to accept. 

Cheers, 
Ethan

--
Dr. Ethan Deneault
Assistant Professor of Physics
SC 234
University of Tampa
Tampa, FL 33606



-Original Message-
From: users-boun...@open-mpi.org on behalf of David Zhang
Sent: Mon 9/20/2010 9:58 PM
To: Open MPI Users
Subject: Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or 
more nodes.
 
I don't know if this will help, but try
mpirun --machinefile testfile -np 4 ./test.out
for running 4 processes

On Mon, Sep 20, 2010 at 3:00 PM, Ethan Deneault  wrote:

> All,
>
> I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the
> /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically /opt/openmpi,
> but Red Hat does things differently. I have my PATH and LD_LIBRARY_PATH set
> correctly; because the test program does compile and run.
>
> The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is a
> AMD x86_64 machine which serves the diskless node images and /home as an NFS
> mount. I compile all of my programs as 32-bit.
>
> My code is a simple hello world:
> $ more test.f
>  program test
>
>  include 'mpif.h'
>  integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
>
>  call MPI_INIT(ierror)
>  call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>  call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>  print*, 'node', rank, ': Hello world'
>  call MPI_FINALIZE(ierror)
>  end
>
> If I run this program with:
>
> $ mpirun --machinefile testfile ./test.out
>  node   0 : Hello world
>  node   2 : Hello world
>  node   1 : Hello world
>
> This is the expected output. Here, testfile contains the master node:
> 'pleiades', and two slave nodes: 'taygeta' and 'm43'
>
> If I add another machine to testfile, say 'asterope', it hangs until I
> ctrl-c it. I have tried every machine, and as long as I do not include more
> than 3 hosts, the program will not hang.
>
> I have run the debug-daemons flag with it as well, and I don't see what is
> wrong specifically.
>
> Working output: pleiades (master) and 2 nodes.
>
> $ mpirun --debug-daemons --machinefile testfile ./test.out
> Daemon was launched on m43 - beginning to initialize
> Daemon was launched on taygeta - beginning to initialize
> Daemon [[46344,0],2] checking in as pid 2140 on host m43
> Daemon [[46344,0],2] not using static ports
> [m43:02140] [[46344,0],2] orted: up and running - waiting for commands!
> [pleiades:19178] [[46344,0],0] node[0].name pleiades daemon 0 arch ffca0200
> [pleiades:19178] [[46344,0],0] node[1].name taygeta daemon 1 arch ffca0200
> [pleiades:19178] [[46344,0],0] node[2].name m43 daemon 2 arch ffca0200
> [pleiades:19178] [[46344,0],0] orted_cmd: received add_local_procs
> [m43:02140] [[46344,0],2] node[0].name pleiades daemon 0 arch ffca0200
> [m43:02140] [[46344,0],2] node[1].name taygeta daemon 1 arch ffca0200
> [m43:02140] [[46344,0],2] node[2].name m43 daemon 2 arch ffca0200
> [m43:02140] [[46344,0],2] orted_cmd: received add_local_procs
> Daemon [[46344,0],1] checking in as pid 2317 on host taygeta
> Daemon [[46344,0],1] not using static ports
> [taygeta:02317] [[46344,0],1] orted: up and running - waiting for commands!
> [taygeta:02317] [[46344,0],1] node[0].name pleiades daemon 0 arch ffca0200
> [taygeta:02317] [[46344,0],1] node[1].name taygeta daemon 1 arch ffca0200
> [taygeta:02317] [[46344,0],1] node[2].name m43 daemon 2 arch ffca0200
> [taygeta:02317] [[46344,0],1] orted_cmd: received add_local_procs
> [pleiades:19178] [[46344,0],0] orted_recv: received sync+nidmap from local
> proc [[46344,1],0]
> [m43:02140] [[46344,0],2] orted_recv: received sync+nidmap from local proc
> [[46344,1],2]
> [taygeta:02317] [[46344,0],1] orted_recv: received sync+nidmap from local
> proc [[46344,1],1]
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [m43:02140] [[46344,0],2] orted_cmd: received collective data cmd
> [taygeta:02317] [[46344,0],1] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received message_local_procs
> [taygeta:02317] [[46344,0],1] orted_cmd: received message_local_procs
> [m43:02140] [[46344,0],2] orted_cmd: received message_local_procs
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [m43:02140] [[46344,0],2] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd
> [pleiades:19178] [[46344,0],0] orted_cmd: received message_local_procs
> [taygeta:02317] [[46344,0],1] orted_cmd: