There are several kind of communications
- ssh from mpirun to compute nodes, and also between compute nodes
(assuming you use a machine file and no supported batch manager) to spawn
orted daemons
- oob/tcp connections between orted
- btl/tcp connections between MPI tasks

You can restrict the port ranges used by oob/tcp and btl/tcp and open them
if you really want a firewall. (I strongly suggest you try without a
firewall first)

Now looking at the error message "no route to hosts"
That could mean there is no route, so you should include/exclude some
subsets/interfaces
mpirun --mca btl_tcp_if_include ... --mca oob_tcp_if_include ... ..
Or there might be a route, but the firewall reports otherwise

Cheers,

Gilles

On Thursday, April 13, 2017, Emin Nuriyev <emin.nuri...@ucdconnect.ie>
wrote:

> I cloned from github latest version of Open MPI on grid5000.
>
> 128 nodes was reserved from nancy site. During execution of my mpi code I
> got error message below:
>
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    graphene-17
>   Remote host:   graphene-91
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
>
> I deployed my OS image. Everything is OK with firewall. Consider that same
> OS image was deployed on all reserved nodes, if Open MPI could connect some
> of them and execute code it means firewall accepted input.
>
> Thre is no problem to connect to graphene-91 with ssh.  But below
> comanline does not work
>
> mpirun -host  graphene-91 -n 1 exec_code
>
> I get same message "Unable to complete a TCP connection"
>
>
> Sometimes I got this error:
>
> WARNING: There is at least non-excluded one OpenFabrics device found,
> but there are no active ports detected (or Open MPI was unable to use
> them).  This is most certainly not what you wanted.  Check your
> cables, subnet manager configuration, etc.  The openib BTL will be
> ignored for this job.
>
>   Local host: graphene-27
> --------------------------------------------------------------------------
> [graphene-26][[56971,1],52][btl_tcp_endpoint.c:796:mca_
> btl_tcp_endpoint_complete_connect] connect() to 172.18.64.25 failed: No
> route to host (113)
> [graphene-29][[56971,1],60][btl_tcp_endpoint.c:796:mca_
> btl_tcp_endpoint_complete_connect] connect() to 172.18.64.27 failed: No
> route to host (113)
> [graphene-14.nancy.grid5000.fr:02890] 15 more processes have sent help
> message help-mpi-btl-openib.txt / no active ports found
> [graphene-14.nancy.grid5000.fr:02890] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
>
> When I change command line using mca parameter to select eth0, there is
> another error. This is not stable version maybe therefore, I get such kind
> of error ?
>
> Your faithfully,
> Emin Nuriyev
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to