[OMPI users] OpenMPI in cloud environments

2020-12-22 Thread Benson Muite via users
When using OpenMPI with Omni-Compiler 
(https://github.com/omni-compiler/omni-compiler ), one gets errors when 
trying to run in a cloud environment because Open MPI "greedily uses up 
all IP addresses". Many cloud environments will offer a number of 
interfaces. While the documentation on this is helpful:

https://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
it may be worth making this information more prominent or changing the 
default behaviour since at runtime the library does not restrict itself 
to the IP addresses specified in the hostfile when using:


mpirun -np 2 -hostfile ./hostfile ./C_hello

and one needs to use:

mpirun -np 2 -hostfile ./hostfile --mca btl_tcp_if_include eth3 ./C_hello



Re: [OMPI users] MPMD hostfile: executables on same hosts

2020-12-22 Thread Vineet Soni via users
Hi Ralph and Gilles, 

Thanks for the suggestions. 

I get the following error when I use --mca rmaps seq in the mpirun command. 

mpirun -hostfile host-file --mca rmaps seq -np 6 ./a1.out : -np 3 ./a2.out 
-- 
No nodes are available for this job, either due to a failure to 
allocate nodes to the job, or allocated nodes being marked 
as unavailable (e.g., down, rebooting, or a process attempting 
to be relocated to another node when none are available). 
-- 

$ cat host-file 
belenos568 
belenos568 
belenos569 
belenos569 
belenos570 
belenos570 
belenos568 
belenos569 
belenos570 

I get the same error every time. If I remove the MCA parameter, it works. But, 
that's not with the MPI process distribution that I want. 
I am using OpenMPI 4.0.2. 

Any suggestion? 

Thanks, 
Vineet 


From: "users"  
To: "users"  
Cc: "Ralph Castain"  
Sent: Monday, 21 December, 2020 15:57:41 
Subject: Re: [OMPI users] MPMD hostfile: executables on same hosts 

You want to use the "sequential" mapper and then specify each proc's location, 
like this for your hostfile: 

host1 
host1 
host2 
host2 
host3 
host3 
host1 
host2 
host3 

and then add "--mca rmaps seq" to your mpirun cmd line. 

Ralph 





On Dec 21, 2020, at 5:22 AM, Vineet Soni via users < [ 
mailto:users@lists.open-mpi.org | users@lists.open-mpi.org ] > wrote: 

Hello, 

I'm having touble using the MPMD hostfile in which I want to place 2 
executables on the same nodes. 

For example, I can do this using Intel MPI by: 
$ mpirun -machine host-file -n 6 ./EXE1 : -n 3 ./EXE2 
$ cat host-file 
host1:2 
host2:2 
host3:2 
host1:1 
host2:1 
host3:1 

This would place 2 MPI processes of EXE1 and 1 MPI process of EXE2 on host1. 

However, I get an error if I define the same hostname twice in the hostfile of 
OpenMPI: 
$ mpirun -hostfile host-file -np 6 ./EXE1 : -np 3 ./EXE2 
$ cat host-file 
host1 slots=2 max_slots=3 
host2 slots=2 max_slots=3 
host3 slots=2 max_slots=3 
host1 slots=1 max_slots=3 
host2 slots=1 max_slots=3 
host3 slots=1 max_slots=3 

Is there a way to place both executables on the same hosts using a hostfile? 

Thanks in advance. 

Best, 
Vineet 







Re: [OMPI users] [ORTE] Connecting back to parent - Forcing tcp port

2020-12-22 Thread Vincent via users

On 18/12/2020 23:04, Josh Hursey wrote:

Vincent,

Thanks for the details on the bug. Indeed this is a case that seems to 
have been a problem for a little while now when you use static ports 
with ORTE (-mca oob_tcp_static_ipv4_ports option). It must have crept 
in when we refactored the internal regular expression mechanism for 
the v4 branches (and now that I look maybe as far back as v3.1). I 
just hit this same issue in the past day or so working with a 
different user.


Though I do not have a suggestion for a workaround at this time 
(sorry) I did file a GitHub Issue and am looking at this issue. With 
the holiday I don't know when I will have a fix, but you can watch the 
ticket for updates.

https://github.com/open-mpi/ompi/issues/8304

In the meantime, you could try the v3.0 series release (which predates 
this change) or the current Open MPI master branch (which approaches 
this a little differently). The same command line should work in both. 
Both can be downloaded from the links below:

https://www.open-mpi.org/software/ompi/v3.0/
https://www.open-mpi.org/nightly/master/

Hello Josh

Thank you for considering the problem. I will certainly keep watching 
the ticket. However, there is nothing really urgent (to me anyway).



Regarding your command line, it looks pretty good:
  orterun --launch-agent /home/boubliki/openmpi/bin/orted -mca btl tcp 
--mca btl_tcp_port_min_v4 6706 --mca btl_tcp_port_range_v4 10 --mca 
oob_tcp_static_ipv4_ports 6705 -host node2:1 -np 1 
/path/to/some/program arg1 .. argn


I would suggest, while you are debugging this, that you use a program 
like /bin/hostname instead of a real MPI program. If /bin/hostname 
launches properly then move on to an MPI program. That will assure you 
that the runtime wired up correctly (oob/tcp), and then we can focus 
on the MPI side of the communication (btl/tcp). You will want to 
change "-mca btl tcp" to at least "-mca btl tcp,self" (or better "-mca 
btl tcp,vader,self" if you want shared memory). 'self' is the loopback 
interface in Open MPI.
Yes. This is actually what I did. I just wanted to be generic and report 
the problem without too much flourish.
But it is important you reminded this for new users, helping them to 
understand the real purpose of each layer in an MPI implementation.




Is there a reason that you are specifying the --launch-agent to the 
orted? Is it installed in a different path on the remote nodes? If 
Open MPI is installed in the same location on all nodes then you 
shouldn't need that.
I recompiled the sources, activating --enable-orterun-prefix-by-default 
when running ./configure. Of course, it helps :)


Again, thank you.

Kind regards

Vincent.




Thanks,
Josh