Ralph,

Yes, it failed.
Sorry, had meant to include more of the output than I did (see below).

My Solaris systems moved (physically relocated the disks) yesterday between
what *should* have been essentially identical hardware.  At the moment I am
looking into the ssh message, though I am sure I should have all the host
keys associated with the correct hostnames and IPs already.

-Paul

full output:

$ mpirun -mca btl sm,self,verbs -np 2 -host pcp-j-31,pcp-j-35
examples/ring_c'
[pcp-j-35:01400]
[/shared/OMPI/openmpi-master-solaris11-x64-ib-ss12u3/openmpi-dev-1351-gccba8ce/orte/mca/oob/tcp/oob_tcp_common.c:103]
setsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol (99)
ssh_exchange_identification: Connection closed by remote host^M
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------




On Fri, Mar 20, 2015 at 7:13 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Hi Paul
>
> It should have kept running, albeit with that warning - did the program
> actually fail?
>
>
> On Mar 19, 2015, at 10:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Seen earlier today with last night's master tarball:
>
> $ mpirun -mca btl sm,self,verbs -np 2 -host pcp-j-31,pcp-j-35
> examples/ring_c'
> [pcp-j-35:01400]
> [/shared/OMPI/openmpi-master-solaris11-x64-ib-ss12u3/openmpi-dev-1351-gccba8ce/orte/mca/oob/tcp/oob_tcp_common.c:103]
> setsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol (99)
>
> -Paul
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>  _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/03/17138.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/03/17139.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to