Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Prentice Bisbal wrote: Ashley Pittman wrote: This smacks of a firewall issue, I thought you'd said you weren't using one but now I read back your emails I can't see anywhere where you say that. Are you running a flrewall or any iptables rules on any of the nodes? It looks to me like you may have some setup from on the worker nodes. Ashley. I agree with Ashley. To make sure it's not an IP tables or SELinux problem on one of the nodes, run these two commands on all teh nodes and then try again: service iptables stop setenforce 0 This fix worked. Delving in deeper, it turns out that there was a typo in the iptables file for the nodes: they were accepting all traffic on eth1 instead of eth0. Only the master has an eth1 port. When I checked the tables earlier, I didn't notice the discrepancy. Thank you all so much! Cheers, Ethan -- Dr. Ethan Deneault Assistant Professor of Physics SC-234 University of Tampa Tampa, FL 33615 Office: (813) 257-3555
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Ashley Pittman wrote: > This smacks of a firewall issue, I thought you'd said you weren't using one > but now I read back your emails I can't see anywhere where you say that. Are > you running a flrewall or any iptables rules on any of the nodes? It looks > to me like you may have some setup from on the worker nodes. > > Ashley. > I agree with Ashley. To make sure it's not an IP tables or SELinux problem on one of the nodes, run these two commands on all teh nodes and then try again: service iptables stop setenforce 0 -- Prentice
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
This smacks of a firewall issue, I thought you'd said you weren't using one but now I read back your emails I can't see anywhere where you say that. Are you running a flrewall or any iptables rules on any of the nodes? It looks to me like you may have some setup from on the worker nodes. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Rolf vandeVaart wrote: Ethan: Can you run just "hostname" successfully? In other words, a non-MPI program. If that does not work, then we know the problem is in the runtime. If it does works, then there is something with the way the MPI library is setting up its connections. Interesting. I did not try this. From the master: $ mpirun -debug-daemons -host merope,asterope -np 2 hostname asterope merope $ mpirun -host merope,asterope,electra -np 3 hostname asterope merope (hangs) $ mpirun -host electra,asterope,merope -np 3 hostname asterope electra (hangs) I cannot get 3 nodes to work together. Each node does work if in a pair of two. I can get three -processes- to work, if I include the master: $ mpirun -host pleiades,electra,asterope -np 3 hostname pleiades electra asterope But 4 processes does not: $ mpirun -host pleiades,electra,asterope,merope -np 4 hostname pleiades electra asterope (hangs) Is there more than one interface on the nodes? Each node only has eth0, and a static DHCP address. Is there something in the way that I have the nodes set up? They boot via PXE from an image on the master, so they should all have the same basic filesystem. Cheers, Ethan Rolf On 09/21/10 14:41, Ethan Deneault wrote: Prentice Bisbal wrote: I'm assuming you already tested ssh connectivity and verified everything is working as it should. (You did test all that, right?) Yes. I am able to log in remotely to all nodes from the master, and to each node from each node without a password. Each node mounts the same /home directory from the master, so they have the same copy of all the ssh and rsh keys. This sounds like configuration problem on one of the nodes, or a problem with ssh. I suspect it's not a problem with the number of processes, but whichever node is the 4th in your machinefile has a connectivity or configuration issue: I would try the following: 1. reorder the list of hosts in your machine file. > 3. Change your machinefile to include 4 completely different hosts. This does not seem to have any beneficial effect. The test program run from the master (pleiades) with any combination of 3 other nodes hangs during communication. This includes not using --machinefile and using -host; i.e. $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs) $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs) $ mpirun -host merope,electra -np 3 ./test.out node 1 : Hello world node 0 : Hello world node 2 : Hello world 2. Run the mpirun command from a different host. I'd try running it from several different hosts. The mpirun command does not seem to work when launched from one of the nodes. As an example: Running on node asterope: asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out Daemon was launched on atlas - beginning to initialize Daemon was launched on electra - beginning to initialize Daemon [[54956,0],1] checking in as pid 2716 on host atlas Daemon [[54956,0],1] not using static ports Daemon [[54956,0],2] checking in as pid 2741 on host electra Daemon [[54956,0],2] not using static ports (hangs) I think someone else recommended that you should be specifying the number of process with -np. I second that. If the above fails, you might want to post your machine file your using. The machine file is a simple list of hostnames, as an example: m43 taygeta asterope Cheers, Ethan ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dr. Ethan Deneault Assistant Professor of Physics SC-234 University of Tampa Tampa, FL 33615 Office: (813) 257-3555
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Ethan: Can you run just "hostname" successfully? In other words, a non-MPI program. If that does not work, then we know the problem is in the runtime. If it does works, then there is something with the way the MPI library is setting up its connections. Is there more than one interface on the nodes? Rolf On 09/21/10 14:41, Ethan Deneault wrote: Prentice Bisbal wrote: I'm assuming you already tested ssh connectivity and verified everything is working as it should. (You did test all that, right?) Yes. I am able to log in remotely to all nodes from the master, and to each node from each node without a password. Each node mounts the same /home directory from the master, so they have the same copy of all the ssh and rsh keys. This sounds like configuration problem on one of the nodes, or a problem with ssh. I suspect it's not a problem with the number of processes, but whichever node is the 4th in your machinefile has a connectivity or configuration issue: I would try the following: 1. reorder the list of hosts in your machine file. > 3. Change your machinefile to include 4 completely different hosts. This does not seem to have any beneficial effect. The test program run from the master (pleiades) with any combination of 3 other nodes hangs during communication. This includes not using --machinefile and using -host; i.e. $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs) $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs) $ mpirun -host merope,electra -np 3 ./test.out node 1 : Hello world node 0 : Hello world node 2 : Hello world 2. Run the mpirun command from a different host. I'd try running it from several different hosts. The mpirun command does not seem to work when launched from one of the nodes. As an example: Running on node asterope: asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out Daemon was launched on atlas - beginning to initialize Daemon was launched on electra - beginning to initialize Daemon [[54956,0],1] checking in as pid 2716 on host atlas Daemon [[54956,0],1] not using static ports Daemon [[54956,0],2] checking in as pid 2741 on host electra Daemon [[54956,0],2] not using static ports (hangs) I think someone else recommended that you should be specifying the number of process with -np. I second that. If the above fails, you might want to post your machine file your using. The machine file is a simple list of hostnames, as an example: m43 taygeta asterope Cheers, Ethan
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Prentice Bisbal wrote: I'm assuming you already tested ssh connectivity and verified everything is working as it should. (You did test all that, right?) Yes. I am able to log in remotely to all nodes from the master, and to each node from each node without a password. Each node mounts the same /home directory from the master, so they have the same copy of all the ssh and rsh keys. This sounds like configuration problem on one of the nodes, or a problem with ssh. I suspect it's not a problem with the number of processes, but whichever node is the 4th in your machinefile has a connectivity or configuration issue: I would try the following: 1. reorder the list of hosts in your machine file. > 3. Change your machinefile to include 4 completely different hosts. This does not seem to have any beneficial effect. The test program run from the master (pleiades) with any combination of 3 other nodes hangs during communication. This includes not using --machinefile and using -host; i.e. $ mpirun -host merope,electra,atlas -np 4 ./test.out (hangs) $ mpirun -host merope,electra,atlas -np 3 ./test.out (hangs) $ mpirun -host merope,electra -np 3 ./test.out node 1 : Hello world node 0 : Hello world node 2 : Hello world 2. Run the mpirun command from a different host. I'd try running it from several different hosts. The mpirun command does not seem to work when launched from one of the nodes. As an example: Running on node asterope: asterope$ mpirun -debug-daemons -host atlas,electra -np 4 ./test.out Daemon was launched on atlas - beginning to initialize Daemon was launched on electra - beginning to initialize Daemon [[54956,0],1] checking in as pid 2716 on host atlas Daemon [[54956,0],1] not using static ports Daemon [[54956,0],2] checking in as pid 2741 on host electra Daemon [[54956,0],2] not using static ports (hangs) I think someone else recommended that you should be specifying the number of process with -np. I second that. If the above fails, you might want to post your machine file your using. The machine file is a simple list of hostnames, as an example: m43 taygeta asterope Cheers, Ethan -- Dr. Ethan Deneault Assistant Professor of Physics SC-234 University of Tampa Tampa, FL 33615 Office: (813) 257-3555
[OMPI users] multipath support for infiniband
Hello, the InfiniBand architecture has a LMC feature to assign mutiple virtual LIDs to one port and so provides multiple paths between two ports. Is there a methode in openmpi to enable message-striping over these paths to increase bandwidth or avoid congestions? (I don't mean the multirail feature, to split traffic across two ports of one Hca) The only function I have found, was to enable automatic path migration over lmc, but this is only for failover, if I remember rightly. Regards, Jens
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Prentice Bisbal wrote: Ethan Deneault wrote: All, I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically /opt/openmpi, but Red Hat does things differently. I have my PATH and LD_LIBRARY_PATH set correctly; because the test program does compile and run. The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is a AMD x86_64 machine which serves the diskless node images and /home as an NFS mount. I compile all of my programs as 32-bit. My code is a simple hello world: $ more test.f program test include 'mpif.h' integer rank, size, ierror, tag, status(MPI_STATUS_SIZE) call MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) print*, 'node', rank, ': Hello world' call MPI_FINALIZE(ierror) end If I run this program with: $ mpirun --machinefile testfile ./test.out node 0 : Hello world node 2 : Hello world node 1 : Hello world This is the expected output. Here, testfile contains the master node: 'pleiades', and two slave nodes: 'taygeta' and 'm43' If I add another machine to testfile, say 'asterope', it hangs until I ctrl-c it. I have tried every machine, and as long as I do not include more than 3 hosts, the program will not hang. I have run the debug-daemons flag with it as well, and I don't see what is wrong specifically. I'm assuming you already tested ssh connectivity and verified everything is working as it should. (You did test all that, right?) This sounds like configuration problem on one of the nodes, or a problem with ssh. I suspect it's not a problem with the number of processes, but whichever node is the 4th in your machinefile has a connectivity or configuration issue: I would try the following: 1. reorder the list of hosts in your machine file. 2. Run the mpirun command from a different host. I'd try running it from several different hosts. 3. Change your machinefile to include 4 completely different hosts. I think someone else recommended that you should be specifying the number of process with -np. I second that. If the above fails, you might want to post your machine file your using. Hi Ethan What your program prints is process number, not the host name. To make sure all nodes are responding, you can try this: http://www.open-mpi.org/faq/?category=running#mpirun-host For the hostfile/machinefile structure, including the number of slots/cores/processors, see "man mpiexec". The OpenMPI FAQ have answers for many of these initial setup questions. Worth taking a look. I hope it helps, Gus Correa
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
Ethan Deneault wrote: > All, > > I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the > /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically > /opt/openmpi, but Red Hat does things differently. I have my PATH and > LD_LIBRARY_PATH set correctly; because the test program does compile and > run. > > The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is > a AMD x86_64 machine which serves the diskless node images and /home as > an NFS mount. I compile all of my programs as 32-bit. > > My code is a simple hello world: > $ more test.f > program test > > include 'mpif.h' > integer rank, size, ierror, tag, status(MPI_STATUS_SIZE) > > call MPI_INIT(ierror) > call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) > call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) > print*, 'node', rank, ': Hello world' > call MPI_FINALIZE(ierror) > end > > If I run this program with: > > $ mpirun --machinefile testfile ./test.out > node 0 : Hello world > node 2 : Hello world > node 1 : Hello world > > This is the expected output. Here, testfile contains the master node: > 'pleiades', and two slave nodes: 'taygeta' and 'm43' > > If I add another machine to testfile, say 'asterope', it hangs until I > ctrl-c it. I have tried every machine, and as long as I do not include > more than 3 hosts, the program will not hang. > > I have run the debug-daemons flag with it as well, and I don't see what > is wrong specifically. > I'm assuming you already tested ssh connectivity and verified everything is working as it should. (You did test all that, right?) This sounds like configuration problem on one of the nodes, or a problem with ssh. I suspect it's not a problem with the number of processes, but whichever node is the 4th in your machinefile has a connectivity or configuration issue: I would try the following: 1. reorder the list of hosts in your machine file. 2. Run the mpirun command from a different host. I'd try running it from several different hosts. 3. Change your machinefile to include 4 completely different hosts. I think someone else recommended that you should be specifying the number of process with -np. I second that. If the above fails, you might want to post your machine file your using. -- Prentice
[OMPI users] PathScale problems persist
Hello, In January, I reported a problem with Open MPI 1.4.1 and PathScale 3.2 about a simple Hello World that hung on initialization ( http://www.open-mpi.org/community/lists/users/2010/01/11863.php ). Open MPI 1.4.2 does not show this problem. However, now we are having trouble with the 1.4.2, PathScale 3.2, and the C++ bindings. The following code: #include #include int main(int argc, char* argv[]) { int node, size; MPI::Init(argc, argv); MPI::COMM_WORLD.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS); try { int rank = MPI::COMM_WORLD.Get_rank(); int size = MPI::COMM_WORLD.Get_size(); std::cout << "Hello world from process " << rank << " out of " << size << "!" << std::endl; } catch(MPI::Exception e) { std::cerr << "MPI Error: " << e.Get_error_code() << " - " << e.Get_error_string() << std::endl; } MPI::Finalize(); return 0; } generates the following output: [host1:29934] *** An error occurred in MPI_Comm_set_errhandler [host1:29934] *** on communicator MPI_COMM_WORLD [host1:29934] *** MPI_ERR_COMM: invalid communicator [host1:29934] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) -- mpirun has exited due to process rank 2 with PID 29934 on node host1 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [host1:29931] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal [host1:29931] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages There are no problems when Open MPI 1.4.2 is built with GCC (GCC 4.1.2). No problems are found with Open MPI 1.2.6 and PathScale either. Best regards, Rafa -- Rafael Arco Arredondo Centro de Servicios de Informática y Redes de Comunicaciones Campus de Fuentenueva - Edificio Mecenas Universidad de Granada
Re: [OMPI users] Thread as MPI process
On 21 Sep 2010, at 09:54, Mikael Lavoie wrote: > Hi, > > Sorry, but i get lost in what i wanna do, i have build a small home cluster > with Pelican_HPC, that user openMPI, and i was trying to find a way to get a > multithreaded program work in a multiprocess way without taking the time to > learn MPI. And my vision was a sort of wrapper that take C posix app src > code, and convert it from pthread to a multiprocessMPI app. But the problem > is the remote memory access, that will only be implemented in MPI 3.0(for > what i've read of it). > > So, after 12 hour of intensive reading about MPI and POSIX, the best way to > deal with my problem(running a C pthreaded app in my cluster) is to convert > the src in a SPMD way. > I didn't mentionned that basicly, my prog open huge text file, take each > string and process it through lot's of cryptographic iteration and then save > the result in an output.out like file. > So i will need to make the master process split the input file and then send > them as input for the worker process. > > But if you or someone else know a kind of interpretor like program to run a > multithreaded C program and convert it logically to a master/worker > multiprocess MPI that will be sended by ssh to the interpreter on the worker > side and then lunched. > > This is what i've tried to explain in the last msg. A dream for the hobyist > that want to get the full power of a night-time cluster, without having to > learn all the MPI syntax and structure. > > If it doesn't exist, this would be a really great tool i think. > > Thank you for your reply, but i think i have answered my question alone... No > Pain, No Gain... What you are thinking of is I believe something more like ScaleMP or Mosix, neither of which I have first-hand experience of. It's a hard problem to solve and I don't believe there is any general solution available. It sounds like your application would be a fairly easy conversion to MPI but to do that you will need to re-code areas of your application, it almost sounds like you could get away with just using MPI_Init, MPI_Scatter and MPI_Gather. Typically you would use the head-node to launch the job but not do any computation, rank 0 in the job would then do the marshalling of data and all ranks would be started simultaneously, you'll find this easier than having one single-rank job spawn more ranks as required. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] Thread as MPI process
Hi, Am 21.09.2010 um 10:54 schrieb Mikael Lavoie: > Sorry, but i get lost in what i wanna do, i have build a small home cluster > with Pelican_HPC, that user openMPI, and i was trying to find a way to get a > multithreaded program work in a multiprocess way without taking the time to > learn MPI. And my vision was a sort of wrapper that take C posix app src > code, and convert it from pthread to a multiprocessMPI app. But the problem > is the remote memory access, that will only be implemented in MPI 3.0(for > what i've read of it). > > So, after 12 hour of intensive reading about MPI and POSIX, the best way to > deal with my problem(running a C pthreaded app in my cluster) is to convert > the src in a SPMD way. > I didn't mentionned that basicly, my prog open huge text file, take each > string and process it through lot's of cryptographic iteration and then save > the result in an output.out like file. > So i will need to make the master process split the input file and then send > them as input for the worker process. > > But if you or someone else know a kind of interpretor like program to run a > multithreaded C program and convert it logically to a master/worker > multiprocess MPI that will be sended by ssh to the interpreter on the worker > side and then lunched. what about taking a step back and use PVM? Of course you have no shared memory access, but it will allow you to transfer information between nodes and start worker processes: http://www.netlib.org/pvm3/book/node17.html Looks like PVM is no longer included in Pelican_HPC by default, but you can compile it on your own. -- Reuti > This is what i've tried to explain in the last msg. A dream for the hobyist > that want to get the full power of a night-time cluster, without having to > learn all the MPI syntax and structure. > > If it doesn't exist, this would be a really great tool i think. > > Thank you for your reply, but i think i have answered my question alone... No > Pain, No Gain... > > On 20 Sep 2010, at 22:24, Mikael Lavoie wrote: > > I wanna know if it exist a implementation that permit to run a single host > > process on the master of the cluster, that will then spawn 1 process per > > -np X defined thread at the host specified in the host list. The host will > > then act as a syncronized sender/collecter of the work done. > > I don't fully understand you explanation either but I may be able to help > clear up what you are asking for: > > If you mean "pthreads" or "linux threads" then no, you cannot have different > threads on different nodes under any programming paradigm. > > However if you mean "execution threads" or in MPI parlance "ranks" then yes, > under OpenMPI each "rank" will be a separate process on one of the nodes in > the host list, as Jody says look at MPI_Comm_Spawn for this. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Thread as MPI process
Am 21.09.2010 um 10:19 schrieb Ashley Pittman: > On 20 Sep 2010, at 22:24, Mikael Lavoie wrote: >> I wanna know if it exist a implementation that permit to run a single host >> process on the master of the cluster, that will then spawn 1 process per -np >> X defined thread at the host specified in the host list. The host will then >> act as a syncronized sender/collecter of the work done. > > I don't fully understand you explanation either but I may be able to help > clear up what you are asking for: > > If you mean "pthreads" or "linux threads" then no, you cannot have different > threads on different nodes under any programming paradigm. There are some efforts like http://www.kerrighed.org/wiki/index.php/Main_Page, but for the current release the thread migration is indeed disabled. -- Reuti > However if you mean "execution threads" or in MPI parlance "ranks" then yes, > under OpenMPI each "rank" will be a separate process on one of the nodes in > the host list, as Jody says look at MPI_Comm_Spawn for this. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Thread as MPI process
Hi, Sorry, but i get lost in what i wanna do, i have build a small home cluster with Pelican_HPC, that user openMPI, and i was trying to find a way to get a multithreaded program work in a multiprocess way without taking the time to learn MPI. And my vision was a sort of wrapper that take C posix app src code, and convert it from pthread to a multiprocessMPI app. But the problem is the remote memory access, that will only be implemented in MPI 3.0(for what i've read of it). So, after 12 hour of intensive reading about MPI and POSIX, the best way to deal with my problem(running a C pthreaded app in my cluster) is to convert the src in a SPMD way. I didn't mentionned that basicly, my prog open huge text file, take each string and process it through lot's of cryptographic iteration and then save the result in an output.out like file. So i will need to make the master process split the input file and then send them as input for the worker process. But if you or someone else know a kind of interpretor like program to run a multithreaded C program and convert it logically to a master/worker multiprocess MPI that will be sended by ssh to the interpreter on the worker side and then lunched. This is what i've tried to explain in the last msg. A dream for the hobyist that want to get the full power of a night-time cluster, without having to learn all the MPI syntax and structure. If it doesn't exist, this would be a really great tool i think. Thank you for your reply, but i think i have answered my question alone... No Pain, No Gain... > > On 20 Sep 2010, at 22:24, Mikael Lavoie wrote: > > I wanna know if it exist a implementation that permit to run a single > host process on the master of the cluster, that will then spawn 1 process > per -np X defined thread at the host specified in the host list. The host > will then act as a syncronized sender/collecter of the work done. > > I don't fully understand you explanation either but I may be able to help > clear up what you are asking for: > > If you mean "pthreads" or "linux threads" then no, you cannot have > different threads on different nodes under any programming paradigm. > > However if you mean "execution threads" or in MPI parlance "ranks" then > yes, under OpenMPI each "rank" will be a separate process on one of the > nodes in the host list, as Jody says look at MPI_Comm_Spawn for this. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Thread as MPI process
On 20 Sep 2010, at 22:24, Mikael Lavoie wrote: > I wanna know if it exist a implementation that permit to run a single host > process on the master of the cluster, that will then spawn 1 process per -np > X defined thread at the host specified in the host list. The host will then > act as a syncronized sender/collecter of the work done. I don't fully understand you explanation either but I may be able to help clear up what you are asking for: If you mean "pthreads" or "linux threads" then no, you cannot have different threads on different nodes under any programming paradigm. However if you mean "execution threads" or in MPI parlance "ranks" then yes, under OpenMPI each "rank" will be a separate process on one of the nodes in the host list, as Jody says look at MPI_Comm_Spawn for this. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] Thread as MPI process
Hi I don't know if i correctly understand what you need, but have you already tried MPI_Comm_spawn? Jody On Mon, Sep 20, 2010 at 11:24 PM, Mikael Lavoiewrote: > Hi, > > I wanna know if it exist a implementation that permit to run a single host > process on the master of the cluster, that will then spawn 1 process per -np > X defined thread at the host specified in the host list. The host will then > act as a syncronized sender/collecter of the work done. > > It would really be the saint-graal of the MPI implementation to me, for the > use i wanna make of it. > > So i wait your answer, hoping that this exist, > > Mikael Lavoie > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes.
David, I did try that after I sent the original mail, but the -np 4 flag doesn't fix the problem, the program still hangs. I've also double checked the iptables for the image and for the master node, and all ports are set to accept. Cheers, Ethan -- Dr. Ethan Deneault Assistant Professor of Physics SC 234 University of Tampa Tampa, FL 33606 -Original Message- From: users-boun...@open-mpi.org on behalf of David Zhang Sent: Mon 9/20/2010 9:58 PM To: Open MPI Users Subject: Re: [OMPI users] Test Program works on 1, 2 or 3 nodes. Hangs on 4 or more nodes. I don't know if this will help, but try mpirun --machinefile testfile -np 4 ./test.out for running 4 processes On Mon, Sep 20, 2010 at 3:00 PM, Ethan Deneaultwrote: > All, > > I am running Scientific Linux 5.5, with OpenMPI 1.4 installed into the > /usr/lib/openmpi/1.4-gcc/ directory. I know this is typically /opt/openmpi, > but Red Hat does things differently. I have my PATH and LD_LIBRARY_PATH set > correctly; because the test program does compile and run. > > The cluster consists of 10 Intel Pentium 4 diskless nodes. The master is a > AMD x86_64 machine which serves the diskless node images and /home as an NFS > mount. I compile all of my programs as 32-bit. > > My code is a simple hello world: > $ more test.f > program test > > include 'mpif.h' > integer rank, size, ierror, tag, status(MPI_STATUS_SIZE) > > call MPI_INIT(ierror) > call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) > call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) > print*, 'node', rank, ': Hello world' > call MPI_FINALIZE(ierror) > end > > If I run this program with: > > $ mpirun --machinefile testfile ./test.out > node 0 : Hello world > node 2 : Hello world > node 1 : Hello world > > This is the expected output. Here, testfile contains the master node: > 'pleiades', and two slave nodes: 'taygeta' and 'm43' > > If I add another machine to testfile, say 'asterope', it hangs until I > ctrl-c it. I have tried every machine, and as long as I do not include more > than 3 hosts, the program will not hang. > > I have run the debug-daemons flag with it as well, and I don't see what is > wrong specifically. > > Working output: pleiades (master) and 2 nodes. > > $ mpirun --debug-daemons --machinefile testfile ./test.out > Daemon was launched on m43 - beginning to initialize > Daemon was launched on taygeta - beginning to initialize > Daemon [[46344,0],2] checking in as pid 2140 on host m43 > Daemon [[46344,0],2] not using static ports > [m43:02140] [[46344,0],2] orted: up and running - waiting for commands! > [pleiades:19178] [[46344,0],0] node[0].name pleiades daemon 0 arch ffca0200 > [pleiades:19178] [[46344,0],0] node[1].name taygeta daemon 1 arch ffca0200 > [pleiades:19178] [[46344,0],0] node[2].name m43 daemon 2 arch ffca0200 > [pleiades:19178] [[46344,0],0] orted_cmd: received add_local_procs > [m43:02140] [[46344,0],2] node[0].name pleiades daemon 0 arch ffca0200 > [m43:02140] [[46344,0],2] node[1].name taygeta daemon 1 arch ffca0200 > [m43:02140] [[46344,0],2] node[2].name m43 daemon 2 arch ffca0200 > [m43:02140] [[46344,0],2] orted_cmd: received add_local_procs > Daemon [[46344,0],1] checking in as pid 2317 on host taygeta > Daemon [[46344,0],1] not using static ports > [taygeta:02317] [[46344,0],1] orted: up and running - waiting for commands! > [taygeta:02317] [[46344,0],1] node[0].name pleiades daemon 0 arch ffca0200 > [taygeta:02317] [[46344,0],1] node[1].name taygeta daemon 1 arch ffca0200 > [taygeta:02317] [[46344,0],1] node[2].name m43 daemon 2 arch ffca0200 > [taygeta:02317] [[46344,0],1] orted_cmd: received add_local_procs > [pleiades:19178] [[46344,0],0] orted_recv: received sync+nidmap from local > proc [[46344,1],0] > [m43:02140] [[46344,0],2] orted_recv: received sync+nidmap from local proc > [[46344,1],2] > [taygeta:02317] [[46344,0],1] orted_recv: received sync+nidmap from local > proc [[46344,1],1] > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [m43:02140] [[46344,0],2] orted_cmd: received collective data cmd > [taygeta:02317] [[46344,0],1] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received message_local_procs > [taygeta:02317] [[46344,0],1] orted_cmd: received message_local_procs > [m43:02140] [[46344,0],2] orted_cmd: received message_local_procs > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [m43:02140] [[46344,0],2] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received collective data cmd > [pleiades:19178] [[46344,0],0] orted_cmd: received message_local_procs > [taygeta:02317] [[46344,0],1] orted_cmd: