[OMPI users] OpenMPI in cloud environments
When using OpenMPI with Omni-Compiler (https://github.com/omni-compiler/omni-compiler ), one gets errors when trying to run in a cloud environment because Open MPI "greedily uses up all IP addresses". Many cloud environments will offer a number of interfaces. While the documentation on this is helpful: https://www.open-mpi.org/faq/?category=tcp#tcp-multi-network it may be worth making this information more prominent or changing the default behaviour since at runtime the library does not restrict itself to the IP addresses specified in the hostfile when using: mpirun -np 2 -hostfile ./hostfile ./C_hello and one needs to use: mpirun -np 2 -hostfile ./hostfile --mca btl_tcp_if_include eth3 ./C_hello
Re: [OMPI users] MPMD hostfile: executables on same hosts
Hi Ralph and Gilles, Thanks for the suggestions. I get the following error when I use --mca rmaps seq in the mpirun command. mpirun -hostfile host-file --mca rmaps seq -np 6 ./a1.out : -np 3 ./a2.out -- No nodes are available for this job, either due to a failure to allocate nodes to the job, or allocated nodes being marked as unavailable (e.g., down, rebooting, or a process attempting to be relocated to another node when none are available). -- $ cat host-file belenos568 belenos568 belenos569 belenos569 belenos570 belenos570 belenos568 belenos569 belenos570 I get the same error every time. If I remove the MCA parameter, it works. But, that's not with the MPI process distribution that I want. I am using OpenMPI 4.0.2. Any suggestion? Thanks, Vineet From: "users" To: "users" Cc: "Ralph Castain" Sent: Monday, 21 December, 2020 15:57:41 Subject: Re: [OMPI users] MPMD hostfile: executables on same hosts You want to use the "sequential" mapper and then specify each proc's location, like this for your hostfile: host1 host1 host2 host2 host3 host3 host1 host2 host3 and then add "--mca rmaps seq" to your mpirun cmd line. Ralph On Dec 21, 2020, at 5:22 AM, Vineet Soni via users < [ mailto:users@lists.open-mpi.org | users@lists.open-mpi.org ] > wrote: Hello, I'm having touble using the MPMD hostfile in which I want to place 2 executables on the same nodes. For example, I can do this using Intel MPI by: $ mpirun -machine host-file -n 6 ./EXE1 : -n 3 ./EXE2 $ cat host-file host1:2 host2:2 host3:2 host1:1 host2:1 host3:1 This would place 2 MPI processes of EXE1 and 1 MPI process of EXE2 on host1. However, I get an error if I define the same hostname twice in the hostfile of OpenMPI: $ mpirun -hostfile host-file -np 6 ./EXE1 : -np 3 ./EXE2 $ cat host-file host1 slots=2 max_slots=3 host2 slots=2 max_slots=3 host3 slots=2 max_slots=3 host1 slots=1 max_slots=3 host2 slots=1 max_slots=3 host3 slots=1 max_slots=3 Is there a way to place both executables on the same hosts using a hostfile? Thanks in advance. Best, Vineet
Re: [OMPI users] [ORTE] Connecting back to parent - Forcing tcp port
On 18/12/2020 23:04, Josh Hursey wrote: Vincent, Thanks for the details on the bug. Indeed this is a case that seems to have been a problem for a little while now when you use static ports with ORTE (-mca oob_tcp_static_ipv4_ports option). It must have crept in when we refactored the internal regular expression mechanism for the v4 branches (and now that I look maybe as far back as v3.1). I just hit this same issue in the past day or so working with a different user. Though I do not have a suggestion for a workaround at this time (sorry) I did file a GitHub Issue and am looking at this issue. With the holiday I don't know when I will have a fix, but you can watch the ticket for updates. https://github.com/open-mpi/ompi/issues/8304 In the meantime, you could try the v3.0 series release (which predates this change) or the current Open MPI master branch (which approaches this a little differently). The same command line should work in both. Both can be downloaded from the links below: https://www.open-mpi.org/software/ompi/v3.0/ https://www.open-mpi.org/nightly/master/ Hello Josh Thank you for considering the problem. I will certainly keep watching the ticket. However, there is nothing really urgent (to me anyway). Regarding your command line, it looks pretty good: orterun --launch-agent /home/boubliki/openmpi/bin/orted -mca btl tcp --mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10 --mca oob_tcp_static_ipv4_ports 6705 -host node2:1 -np 1 /path/to/some/program arg1 .. argn I would suggest, while you are debugging this, that you use a program like /bin/hostname instead of a real MPI program. If /bin/hostname launches properly then move on to an MPI program. That will assure you that the runtime wired up correctly (oob/tcp), and then we can focus on the MPI side of the communication (btl/tcp). You will want to change "-mca btl tcp" to at least "-mca btl tcp,self" (or better "-mca btl tcp,vader,self" if you want shared memory). 'self' is the loopback interface in Open MPI. Yes. This is actually what I did. I just wanted to be generic and report the problem without too much flourish. But it is important you reminded this for new users, helping them to understand the real purpose of each layer in an MPI implementation. Is there a reason that you are specifying the --launch-agent to the orted? Is it installed in a different path on the remote nodes? If Open MPI is installed in the same location on all nodes then you shouldn't need that. I recompiled the sources, activating --enable-orterun-prefix-by-default when running ./configure. Of course, it helps :) Again, thank you. Kind regards Vincent. Thanks, Josh