I cloned from github latest version of Open MPI on grid5000.

128 nodes was reserved from nancy site. During execution of my mpi code I
got error message below:

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    graphene-17
  Remote host:   graphene-91
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.

I deployed my OS image. Everything is OK with firewall. Consider that same
OS image was deployed on all reserved nodes, if Open MPI could connect some
of them and execute code it means firewall accepted input.

Thre is no problem to connect to graphene-91 with ssh.  But below comanline
does not work

mpirun -host  graphene-91 -n 1 exec_code

I get same message "Unable to complete a TCP connection"


Sometimes I got this error:

WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: graphene-27
--------------------------------------------------------------------------
[graphene-26][[56971,1],52][btl_tcp_endpoint.c:796:mca_btl_tcp_endpoint_complete_connect]
connect() to 172.18.64.25 failed: No route to host (113)
[graphene-29][[56971,1],60][btl_tcp_endpoint.c:796:mca_btl_tcp_endpoint_complete_connect]
connect() to 172.18.64.27 failed: No route to host (113)
[graphene-14.nancy.grid5000.fr:02890] 15 more processes have sent help
message help-mpi-btl-openib.txt / no active ports found
[graphene-14.nancy.grid5000.fr:02890] Set MCA parameter
"orte_base_help_aggregate" to 0 to see all help / error messages

When I change command line using mca parameter to select eth0, there is
another error. This is not stable version maybe therefore, I get such kind
of error ?

Your faithfully,
Emin Nuriyev
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to