There are several kind of communications - ssh from mpirun to compute nodes, and also between compute nodes (assuming you use a machine file and no supported batch manager) to spawn orted daemons - oob/tcp connections between orted - btl/tcp connections between MPI tasks
You can restrict the port ranges used by oob/tcp and btl/tcp and open them if you really want a firewall. (I strongly suggest you try without a firewall first) Now looking at the error message "no route to hosts" That could mean there is no route, so you should include/exclude some subsets/interfaces mpirun --mca btl_tcp_if_include ... --mca oob_tcp_if_include ... .. Or there might be a route, but the firewall reports otherwise Cheers, Gilles On Thursday, April 13, 2017, Emin Nuriyev <emin.nuri...@ucdconnect.ie> wrote: > I cloned from github latest version of Open MPI on grid5000. > > 128 nodes was reserved from nancy site. During execution of my mpi code I > got error message below: > > A process or daemon was unable to complete a TCP connection > to another process: > Local host: graphene-17 > Remote host: graphene-91 > This is usually caused by a firewall on the remote host. Please > check that any firewall (e.g., iptables) has been disabled and > try again. > > I deployed my OS image. Everything is OK with firewall. Consider that same > OS image was deployed on all reserved nodes, if Open MPI could connect some > of them and execute code it means firewall accepted input. > > Thre is no problem to connect to graphene-91 with ssh. But below > comanline does not work > > mpirun -host graphene-91 -n 1 exec_code > > I get same message "Unable to complete a TCP connection" > > > Sometimes I got this error: > > WARNING: There is at least non-excluded one OpenFabrics device found, > but there are no active ports detected (or Open MPI was unable to use > them). This is most certainly not what you wanted. Check your > cables, subnet manager configuration, etc. The openib BTL will be > ignored for this job. > > Local host: graphene-27 > -------------------------------------------------------------------------- > [graphene-26][[56971,1],52][btl_tcp_endpoint.c:796:mca_ > btl_tcp_endpoint_complete_connect] connect() to 172.18.64.25 failed: No > route to host (113) > [graphene-29][[56971,1],60][btl_tcp_endpoint.c:796:mca_ > btl_tcp_endpoint_complete_connect] connect() to 172.18.64.27 failed: No > route to host (113) > [graphene-14.nancy.grid5000.fr:02890] 15 more processes have sent help > message help-mpi-btl-openib.txt / no active ports found > [graphene-14.nancy.grid5000.fr:02890] Set MCA parameter > "orte_base_help_aggregate" to 0 to see all help / error messages > > When I change command line using mca parameter to select eth0, there is > another error. This is not stable version maybe therefore, I get such kind > of error ? > > Your faithfully, > Emin Nuriyev >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel