I cloned from github latest version of Open MPI on grid5000. 128 nodes was reserved from nancy site. During execution of my mpi code I got error message below:
A process or daemon was unable to complete a TCP connection to another process: Local host: graphene-17 Remote host: graphene-91 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. I deployed my OS image. Everything is OK with firewall. Consider that same OS image was deployed on all reserved nodes, if Open MPI could connect some of them and execute code it means firewall accepted input. Thre is no problem to connect to graphene-91 with ssh. But below comanline does not work mpirun -host graphene-91 -n 1 exec_code I get same message "Unable to complete a TCP connection" Sometimes I got this error: WARNING: There is at least non-excluded one OpenFabrics device found, but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job. Local host: graphene-27 -------------------------------------------------------------------------- [graphene-26][[56971,1],52][btl_tcp_endpoint.c:796:mca_btl_tcp_endpoint_complete_connect] connect() to 172.18.64.25 failed: No route to host (113) [graphene-29][[56971,1],60][btl_tcp_endpoint.c:796:mca_btl_tcp_endpoint_complete_connect] connect() to 172.18.64.27 failed: No route to host (113) [graphene-14.nancy.grid5000.fr:02890] 15 more processes have sent help message help-mpi-btl-openib.txt / no active ports found [graphene-14.nancy.grid5000.fr:02890] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages When I change command line using mca parameter to select eth0, there is another error. This is not stable version maybe therefore, I get such kind of error ? Your faithfully, Emin Nuriyev
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel