[OMPI users] Naming MPI_Spawn children
HI, I'm running openmpi on Rackspace cloud over Internet using MPI_Spawn. IT means, I run the parent on my PC and the children on Rackspace cloud machines. Rackspace provides direct IP addresses of the machines (no NAT), that is why it is possible. Now, there is a communicator involving only the children and some communications involve only communication between children (on Rackspace cloud, in this scenario). When we conducted experiments, we experienced more than expected delays in this operation - communication between children alone. My assumption is that openMPI is looking at the direct IP addresses at the hostfile and try to communicate between Rackspace children over Internet. What I would want/expect is the Rackspace children communicate between themselves internally, using the internal Rackspace hostnames. Rackspace provide internal IP addresses. But if I use that in the hostfile at my home PC, the parent wont be able to access the children (there is a communicator involving parent and children). Can I anyway tell openMPI to look into the internal IP addresses of Rackspace machines (another hostfile, may be) for the sub-group (communicator) involving Rackspace children? In that case we will get performance improvement, I guess. Thanks in advance for your valuable suggestions. Jaison Australian National University.
[OMPI users] Accessing OpenMPI processes on EC2 machine over Internet using ssh
We have reported this before. We are still not able to do it, fully. However partially successful, now. We have used a machine with static IP address and modified the router settings by opening all ssh ports. Master runs on this machine and the slaves on EC2. Now we can run the "Hello world" over internet using ssh. It starts MPI executables in EC2 (we can see on 'top') and print back "hello" to our home/master machine. But send/recv doesnt work. send/recv hang between master(home PC)<->slave(EC2), both ways. What are the port settings for send/recv? Do we need to modify anything? Any help is very much appreciated. Jaison Australian National Uni
Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
Ralph Castain open-mpi.org> writes: > > This has come up before - I would suggest doing a quick search of "ec2" on our user list. Here is one solution: > On Jun 14, 2011, at 10:50 AM, Barnet Wagman wrote:I've put together a simple system for running OMPI on EC2 (Amazon's cloud computing service). If you're interested, see > > > > http://norbl.com/ppe-ompi.html > I have tried little bit more: I have set the MCA parameters as follows: mpirun -np 1 --mca btl tcp,self --mca btl_tcp_if_exclude lo,eth0 -hostfile hostinfo nbs-client -bynode But still failed and got the following error: Permission denied (publickey). -- A daemon (pid 24744) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- mpirun: clean termination accomplished I dont understand the "Permission denied (publickey)" error. I access the EC2 instance using password-less ssh as follows: ssh ubuntu ec2-67-202-**-***.compute-1.amazonaws.com So, what went wrong? hostinfo file is: [jmulerik jaison Client]$ cat hostinfo localhost ubuntu ec2-67-202-48-118.compute-1.amazonaws.com Jaison
Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
Jeff Squyres cisco.com> writes: > > On Nov 30, 2011, at 6:03 AM, Jaison Paul wrote: > > > Yes, we have set up .ssh file on remote EC2 hosts. Is there anything else that we should be taking care of when > dealing with EC2? > > I have heard that Open MPI's TCP latency on EC2 is horrid. I actually talked with some Amazon / EC2 folks about > it at SC'11 a few weeks ago; we set a date to dive into it a bit deeper in December. > > No promises on when/if the TCP latency will improve, but it's definitely something that we're looking at. > My first *guess* is that it might have something to do with specifying btl_tcp_if_include / > oob_tcp_if_include improperly (or not at all) -- but that's a SWAG. > I have tried little bit more: I have set the MCA parameters as follows: mpirun -np 1 --mca btl tcp,self --mca btl_tcp_if_exclude lo,eth0 -hostfile hostinfo nbs-client -bynode But still failed and got the following error: Permission denied (publickey). -- A daemon (pid 24744) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- mpirun: clean termination accomplished I dont understand the "Permission denied (publickey)" error. I access the EC2 instance using password-less ssh as follows: ssh ubuntu@ec2-67-202-**-***.compute-1.amazonaws.com So, what went wrong? hostinfo file is: [jmulerik@jaison Client]$ cat hostinfo localhost ubu...@ec2-67-202-48-118.compute-1.amazonaws.com Jaison
Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
Ralph Castain open-mpi.org> writes: > > This has come up before - I would suggest doing a quick search of "ec2" on our user list. Here is one solution: > On Jun 14, 2011, at 10:50 AM, Barnet Wagman wrote:I've put together a simple system for running OMPI on EC2 (Amazon's cloud computing service). If you're interested, see > > > > http://norbl.com/ppe-ompi.html > Thank you Barnet. We are using some scripts at the moment to easily configure EC2 nodes with ompi. Will try this one. But this is to set up a network of Ompi hosts within EC2, right? Does not support a client outside EC2 and the slaves inside EC2? Jaison > > > Barnet Wagman > > > >
Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
Jeff Squyres cisco.com> writes: > > On Nov 30, 2011, at 6:03 AM, Jaison Paul wrote: > > > Yes, we have set up .ssh file on remote EC2 hosts. Is there anything else that we should be taking care of when > dealing with EC2? > > I have heard that Open MPI's TCP latency on EC2 is horrid. I actually talked with some Amazon / EC2 folks about > it at SC'11 a few weeks ago; we set a date to dive into it a bit deeper in December. > > No promises on when/if the TCP latency will improve, but it's definitely something that we're looking at. > My first *guess* is that it might have something to do with specifying btl_tcp_if_include / > oob_tcp_if_include improperly (or not at all) -- but that's a SWAG. > Yes Jeff, We are not setting up --mca btl_tcp_if_include / --mca oob_tcp_if_include at all at the moment. What will be the best setup to access EC2 hosts over internet for --mca btl_tcp_if_include / --mca oob_tcp_if_include? I dont understand --mca very well. Thanks, Jaison
Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
Ralph Castain open-mpi.org> writes: > > > On Nov 24, 2011, at 2:00 AM, Reuti wrote: > Thanks a lot to Ralph and Reuti. Actually we are trying to use EC2 nodes as compute nodes and my local PC as host node. Happy to know that it is OK to use usersomehost.com We used that but failed. Would try again. Yes, we have set up .ssh file on remote EC2 hosts. Is there anything else that we should be taking care of when dealing with EC2? Jaison > > Hi, > > > > Am 24.11.2011 um 05:26 schrieb Jaison Paul: > > > >> I am trying to access OpenMPI processes over Internet using ssh and not quite successful, yet. I believe > that I should be able to do it. > >> > >> I have to run one process on my PC and the rest on a remote cluster over internet. I have set the public keys > (at .ssh/authorized_keys) to access remote nodes without a password. > >> > >> I use hostfile to run mpi. It will read something like: > >> - > >> localhost > >> user remotehost.com > > > > this is not a valid syntax for Open MPI. > > This isn't correct - we have long supported that syntax in a hostfile, and there is no issue with having a > different user name at each node. > > Jaison: are you sure your nodes are setup for password-less ssh? In other words, have you setup your .ssh > files on the remote nodes so they will allow us to ssh a process on them without providing a password? This is > the typical problem we see. > > > > > > >> - > >> But it fails. > >> > >> The issue seems to be the user! That is, the user on my PC is different to that of user at remotehosts. That's > my assumption. > >> > >> Is this the problem? Is there any work-around to solve this issue? Do I need to have same username at all > nodes to solve this issue? > > > > You can define nicknames for an ssh connection in a file ~/.ssh/config like: > > > > Host foobar > >User baz > >Hostname the.remote.server.demo > >Port 1234 > > > > While this will work with any nickname for an ssh connection, in your case the nickname must match the one > specified in the hostfile, as Open MPI won't use this lookup file: > > > > Host remotehost.com > >User user > > > > ssh should then use the entries therein to initiate the connection. For details you can have a look at `man ssh_config`. > > > > -- Reuti > > ___ > > users mailing list > > users open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Accessing OpenMPI processes over Internet using ssh
Hi all, I am trying to access OpenMPI processes over Internet using ssh and not quite successful, yet. I believe that I should be able to do it. I have to run one process on my PC and the rest on a remote cluster over internet. I have set the public keys (at .ssh/authorized_keys) to access remote nodes without a password. I use hostfile to run mpi. It will read something like: - localhost u...@remotehost.com - But it fails. The issue seems to be the user! That is, the user on my PC is different to that of user at remotehosts. That's my assumption. Is this the problem? Is there any work-around to solve this issue? Do I need to have same username at all nodes to solve this issue? Jaison, ANU
Re: [OMPI users] How to start MPI_Spawn child processes early?
Hi, I am just reposting my early query once again. If anyone one can give some hint, that would be great. Thanks, Jaison ANU Jaison Paul wrote: Hi All, I am trying to use MPI for scientific High Performance (hpc) applications. I use MPI_Spawn to create child processes. Is there a way to start child processes early than the parent process, using MPI_Spawn? I want this because, my experiments showed that the time to spawn the children by parent is too long for HPC apps which slows down the whole process. If the children are ready when parent application process seeks for them, that initial delay can be avoided. Is there a way to do that? Thanks in advance, Jaison Australian National University ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Can I start MPI_Spawn child processes early?
Hi All, I am trying to use MPI for scientific High Performance (hpc) applications. I use MPI_Spawn to create child processes. Is there a way to start child processes early than the parent process, using MPI_Spawn? I want this because, my experiments showed that the time to spawn the children by parent is too long for HPC apps which slows down the whole process. If the children are ready when parent application process seeks for them, that initial delay can be avoided. Is there a way to do that? Thanks in advance, Jaison Australian National University
Re: [OMPI users] Fails to run "MPI_Comm_spawn" on remote host
Hi Ralph, Thank you so much for your reply. Your tips worked! The idea is to set the hosts first and then pick them using 'host' reserved key in MPI_info. Great! Thanks a ton. I tried "-host" variable in mpirun like: "mpirun --prefix /opt/mpi/ompi-1.3.2/ -np 1 -host myhost1,myhost2 spawner" and set "MPI_info" reserved key 'host' to set the remote host like: MPI_Info hostinfo; MPI_Info_create(&hostinfo); MPI_Info_set(hostinfo, "host", "myhost2"); MPI_Info_set(hostinfo, "wdir", "/home/jaison/mpi/advanced_MPI/ spawn/lib"); Now I can run child processes in remote host - myhost2. I shall also try the "add-hostfile" option. Btw, the man page of MPI_Comm_spawn does not give detailed information as you have just done. Jaison http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html On 16/09/2009, at 12:39 PM, Ralph Castain wrote: We don't support the ability to add a new host during a comm_spawn call in the 1.3 series. This is a feature that is being added for the upcoming new feature series release (tagged 1.5). There are two solutions to this problem in 1.3: 1. declare all hosts at the beginning of the job. You can then specify which one to use with the "host" key. 2. you -can- add a hostfile to the job during a comm_spawn. This is done with the "add-hostfile" key. All the hosts in the hostfile will be added to the job. You can then specify which host(s) to use for this particular comm_spawn with the "host" key. All of this is documented - you should see it with a "man MPI_Comm_spawn" command. If you need to dynamically add a host via "host" before then, you could try downloading a copy of the developer's trunk from the OMPI web site. It is implemented there at this time - and also documented via the man page. Ralph On Tue, Sep 15, 2009 at 5:14 PM, Jaison Paul wrote: Hi All, I am waiting on some inputs on my query. I just wanted to know whether I can run dynamic child processes using 'MPI_Comm_spawn' on remote hosts? (in openmpi 1.3.2)). Has anyone did that successfully? Or OpenMPI hasnt implemented it yet? Please help. Jaison http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html On 14/09/2009, at 8:45 AM, Jaison Paul wrote: Hi, I am trying to create a library using OpenMPI for an SOA middleware for my Phd research. "MPI_Comm_spawn" is the one I need to go for. I got a sample example working, but only on the local host. Whenever I try to run the spawned children on a remote hosts, parent cannot launch children on remote hosts and I get the following error message: --BEGIN MPIRUN AND ERROR MSG mpirun --prefix /opt/mpi/ompi-1.3.2/ --mca btl_tcp_if_include eth0 -np 1 /home/jaison/mpi/advanced_MPI/spawn/manager Manager code started - host headnode -- myid & world_size 0 1 Host is: myhost WorkDir is: /home/jaison/mpi/advanced_MPI/spawn/lib - - There are no allocated resources for the application /home/jaison/mpi/advanced_MPI/spawn//lib that match the requested mapping: Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. - - - - A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. - - mpirun: clean termination accomplished --END OF ERROR MSG--- I use the reserved keys - 'host' & 'wdir' - to set the remote host and work directory using MPI_Info. Here is the code snippet: --BEGIN Code Snippet--- MPI_Info hostinfo; MPI_Info_create(&hostinfo); MPI_Info_set(hostinfo, "host", "myhost"); MPI_Info_set(hostinfo, "wdir", "/home/jaison/mpi/advanced_MPI/ spawn/lib"); // Checking for 'hostinfo'. The results are okay (see above) int test0 = MPI_Info_get(hostinfo, "host", valuelen, value, &flag); int test = MPI_Info_get(hostinfo, "wdir", valuelen, value1, &flag); printf("Host is: %s\n", value);
Re: [OMPI users] Fails to run "MPI_Comm_spawn" on remote host
Hi All, I am waiting on some inputs on my query. I just wanted to know whether I can run dynamic child processes using 'MPI_Comm_spawn' on remote hosts? (in openmpi 1.3.2)). Has anyone did that successfully? Or OpenMPI hasnt implemented it yet? Please help. Jaison http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html On 14/09/2009, at 8:45 AM, Jaison Paul wrote: Hi, I am trying to create a library using OpenMPI for an SOA middleware for my Phd research. "MPI_Comm_spawn" is the one I need to go for. I got a sample example working, but only on the local host. Whenever I try to run the spawned children on a remote hosts, parent cannot launch children on remote hosts and I get the following error message: --BEGIN MPIRUN AND ERROR MSG mpirun --prefix /opt/mpi/ompi-1.3.2/ --mca btl_tcp_if_include eth0 - np 1 /home/jaison/mpi/advanced_MPI/spawn/manager Manager code started - host headnode -- myid & world_size 0 1 Host is: myhost WorkDir is: /home/jaison/mpi/advanced_MPI/spawn/lib -- There are no allocated resources for the application /home/jaison/mpi/advanced_MPI/spawn//lib that match the requested mapping: Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -- -- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- mpirun: clean termination accomplished --END OF ERROR MSG--- I use the reserved keys - 'host' & 'wdir' - to set the remote host and work directory using MPI_Info. Here is the code snippet: --BEGIN Code Snippet--- MPI_Info hostinfo; MPI_Info_create(&hostinfo); MPI_Info_set(hostinfo, "host", "myhost"); MPI_Info_set(hostinfo, "wdir", "/home/jaison/mpi/advanced_MPI/ spawn/lib"); // Checking for 'hostinfo'. The results are okay (see above) int test0 = MPI_Info_get(hostinfo, "host", valuelen, value, &flag); int test = MPI_Info_get(hostinfo, "wdir", valuelen, value1, &flag); printf("Host is: %s\n", value); printf("WorkDir is: %s\n", value1); sprintf( launched_program, "launched_program" ); MPI_Comm_spawn( launched_program, MPI_ARGV_NULL , number_to_spawn, hostinfo, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE ); --END OF Code Snippet--- I've set the LD_LIBRARY_PATH correctly. Is "MPI_Comm_spawn" implemented in open mpi (I am using version 1.3.2)? If so, where am I going wrong? Any input will be very much appreciated. Thanking you in advance. Jaison jmule...@cs.anu.edu.au http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Fails to run "MPI_Comm_spawn" on remote host
Hi, I am trying to create a library using OpenMPI for an SOA middleware for my Phd research. "MPI_Comm_spawn" is the one I need to go for. I got a sample example working, but only on the local host. Whenever I try to run the spawned children on a remote hosts, parent cannot launch children on remote hosts and I get the following error message: --BEGIN MPIRUN AND ERROR MSG mpirun --prefix /opt/mpi/ompi-1.3.2/ --mca btl_tcp_if_include eth0 - np 1 /home/jaison/mpi/advanced_MPI/spawn/manager Manager code started - host headnode -- myid & world_size 0 1 Host is: myhost WorkDir is: /home/jaison/mpi/advanced_MPI/spawn/lib -- There are no allocated resources for the application /home/jaison/mpi/advanced_MPI/spawn//lib that match the requested mapping: Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -- -- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- mpirun: clean termination accomplished --END OF ERROR MSG--- I use the reserved keys - 'host' & 'wdir' - to set the remote host and work directory using MPI_Info. Here is the code snippet: --BEGIN Code Snippet--- MPI_Info hostinfo; MPI_Info_create(&hostinfo); MPI_Info_set(hostinfo, "host", "myhost"); MPI_Info_set(hostinfo, "wdir", "/home/jaison/mpi/advanced_MPI/ spawn/lib"); // Checking for 'hostinfo'. The results are okay (see above) int test0 = MPI_Info_get(hostinfo, "host", valuelen, value, &flag); int test = MPI_Info_get(hostinfo, "wdir", valuelen, value1, &flag); printf("Host is: %s\n", value); printf("WorkDir is: %s\n", value1); sprintf( launched_program, "launched_program" ); MPI_Comm_spawn( launched_program, MPI_ARGV_NULL , number_to_spawn, hostinfo, 0, MPI_COMM_SELF, &everyone, MPI_ERRCODES_IGNORE ); --END OF Code Snippet--- I've set the LD_LIBRARY_PATH correctly. Is "MPI_Comm_spawn" implemented in open mpi (I am using version 1.3.2)? If so, where am I going wrong? Any input will be very much appreciated. Thanking you in advance. Jaison jmule...@cs.anu.edu.au http://cs.anu.edu.au/~Jaison.Mulerikkal/Home.html