Ralph,

Requested output is attached.

I have a Linux/x86 system with the same network configuration and will soon
be able to determine if the problem is specific to Solaris.

-Paul


On Mon, Nov 3, 2014 at 7:11 PM, Ralph Castain <rhc.open...@gmail.com> wrote:

> Could you please set -mca oob_base_verbose 20? I'm not sure why the
> connection is failing.
>
> Thanks
> Ralph
>
> On Nov 3, 2014, at 5:56 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Not clear if the following failure is Solaris-specific, but it *IS* a
> regression relative to 1.8.3.
>
> The system has 2 IPV4 interfaces:
>    Ethernet on 172.16.0.119/16
>    IPoIB on 172.18.0.119/16
>
> $ ifconfig bge0
> bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500
> index 2
>         inet 172.16.0.119 netmask ffff0000 broadcast 172.16.255.255
> $ ifconfig pFFFF.ibp0
> pFFFF.ibp0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU>
> mtu 2044 index 3
>         inet 172.18.0.119 netmask ffff0000 broadcast 172.18.255.255
>
> However, I get a message from mca/oob/tcp about not being able to
> communicate between these two interfaces ON THE SAME NODE:
>
> $ /shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/INST/bin/mpirun
> -mca btl sm,self,openib -np 1 -host pcp-j-19 examples/ring_c
> [pcp-j-19:00899] mca_oob_tcp_accept: accept() failed: Error 0 (0).
> ------------------------------------------------------------
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    pcp-j-19
>   Remote host:   172.18.0.119
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
>
> Let me know what sort of verbose options I should use to gather any
> additional info you may need.
>
> -Paul
>
> On Fri, Oct 31, 2014 at 7:14 PM, Ralph Castain <rhc.open...@gmail.com>
> wrote:
>
>> Hi folks
>>
>> I know 1.8.4 isn't entirely complete just yet, but I'd like to get a head
>> start on the testing so we can release by Fri Nov 7th. So please take a
>> little time and test the current tarball:
>>
>> http://www.open-mpi.org/software/ompi/v1.8/
>>
>> Thanks
>> Ralph
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/10/16138.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>  _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16160.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/11/16161.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
[pcp-j-19:01003] mca: base: components_register: registering oob components
[pcp-j-19:01003] mca: base: components_register: found loaded component tcp
[pcp-j-19:01003] mca: base: components_register: component tcp register 
function successful
[pcp-j-19:01003] mca: base: components_open: opening oob components
[pcp-j-19:01003] mca: base: components_open: found loaded component tcp
[pcp-j-19:01003] mca: base: components_open: component tcp open function 
successful
[pcp-j-19:01003] mca:oob:select: checking available component tcp
[pcp-j-19:01003] mca:oob:select: Querying component [tcp]
[pcp-j-19:01003] oob:tcp: component_available called
[pcp-j-19:01003] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[pcp-j-19:01003] [[26539,0],0] oob:tcp:init rejecting loopback interface lo0
[pcp-j-19:01003] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[pcp-j-19:01003] [[26539,0],0] oob:tcp:init adding 172.16.0.119 to our list of 
V4 connections
[pcp-j-19:01003] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4
[pcp-j-19:01003] [[26539,0],0] oob:tcp:init adding 172.18.0.119 to our list of 
V4 connections
[pcp-j-19:01003] [[26539,0],0] TCP STARTUP
[pcp-j-19:01003] [[26539,0],0] attempting to bind to IPv4 port 0
[pcp-j-19:01003] [[26539,0],0] assigned IPv4 port 43391
[pcp-j-19:01003] mca:oob:select: Adding component to end
[pcp-j-19:01003] mca:oob:select: Found 1 active transports
[pcp-j-19:01003] [[26539,0],0]: set_addr to uri 
1739259904.0;tcp://172.16.0.119,172.18.0.119:43391
[pcp-j-19:01003] [[26539,0],0]:set_addr peer [[26539,0],0] is me
[pcp-j-19:01004] mca: base: components_register: registering oob components
[pcp-j-19:01004] mca: base: components_register: found loaded component tcp
[pcp-j-19:01004] mca: base: components_register: component tcp register 
function successful
[pcp-j-19:01004] mca: base: components_open: opening oob components
[pcp-j-19:01004] mca: base: components_open: found loaded component tcp
[pcp-j-19:01004] mca: base: components_open: component tcp open function 
successful
[pcp-j-19:01004] mca:oob:select: checking available component tcp
[pcp-j-19:01004] mca:oob:select: Querying component [tcp]
[pcp-j-19:01004] oob:tcp: component_available called
[pcp-j-19:01004] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[pcp-j-19:01004] [[26539,1],0] oob:tcp:init rejecting loopback interface lo0
[pcp-j-19:01004] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[pcp-j-19:01004] [[26539,1],0] oob:tcp:init adding 172.16.0.119 to our list of 
V4 connections
[pcp-j-19:01004] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4
[pcp-j-19:01004] [[26539,1],0] oob:tcp:init adding 172.18.0.119 to our list of 
V4 connections
[pcp-j-19:01004] [[26539,1],0] TCP STARTUP
[pcp-j-19:01004] [[26539,1],0] attempting to bind to IPv4 port 0
[pcp-j-19:01004] [[26539,1],0] assigned IPv4 port 56330
[pcp-j-19:01004] mca:oob:select: Adding component to end
[pcp-j-19:01004] mca:oob:select: Found 1 active transports
[pcp-j-19:01004] [[26539,1],0]: set_addr to uri 
1739259904.0;tcp://172.16.0.119,172.18.0.119:43391
[pcp-j-19:01004] [[26539,1],0]:set_addr checking if peer [[26539,0],0] is 
reachable via component tcp
[pcp-j-19:01004] [[26539,1],0] oob:tcp: working peer [[26539,0],0] address 
tcp://172.16.0.119,172.18.0.119:43391
[pcp-j-19:01004] [[26539,1],0] PASSING ADDR 172.16.0.119 TO MODULE
[pcp-j-19:01004] [[26539,1],0]:tcp set addr for peer [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] PASSING ADDR 172.18.0.119 TO MODULE
[pcp-j-19:01004] [[26539,1],0]:tcp set addr for peer [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0]: peer [[26539,0],0] is reachable via component 
tcp
[pcp-j-19:01004] [[26539,1],0] OOB_SEND: 
/shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/openmpi-1.8.4rc1/orte/mca/rml/oob/rml_oob_send.c:199
[pcp-j-19:01004] [[26539,1],0]:tcp:processing set_peer cmd
[pcp-j-19:01004] [[26539,1],0] SET_PEER ADDING PEER [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] set_peer: peer [[26539,0],0] is listening on net 
172.16.0.119 port 43391
[pcp-j-19:01004] [[26539,1],0]:tcp:processing set_peer cmd
[pcp-j-19:01004] [[26539,1],0] set_peer: peer [[26539,0],0] is listening on net 
172.18.0.119 port 43391
[pcp-j-19:01004] [[26539,1],0] oob:base:send to target [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] oob:tcp:send_nb to peer [[26539,0],0]:1
[pcp-j-19:01004] [[26539,1],0] tcp:send_nb to peer [[26539,0],0]
[pcp-j-19:01004] 
[[26539,1],0]:[/shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/openmpi-1.8.4rc1/orte/mca/oob/tcp/oob_tcp.c:478]
 post send to [[26539,0],0]
[pcp-j-19:01004] 
[[26539,1],0]:[/shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/openmpi-1.8.4rc1/orte/mca/oob/tcp/oob_tcp.c:415]
 processing send to peer [[26539,0],0]:1
[pcp-j-19:01004] 
[[26539,1],0]:[/shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/openmpi-1.8.4rc1/orte/mca/oob/tcp/oob_tcp.c:449]
 queue pending to [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] tcp:send_nb: initiating connection to 
[[26539,0],0]
[pcp-j-19:01004] 
[[26539,1],0]:[/shared/OMPI/openmpi-1.8.4rc1-solaris11-x86-ib-ss12u3/openmpi-1.8.4rc1/orte/mca/oob/tcp/oob_tcp.c:463]
 connect to [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] orte_tcp_peer_try_connect: attempting to connect 
to proc [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] oob:tcp:peer creating socket to [[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] orte_tcp_peer_try_connect: attempting to connect 
to proc [[26539,0],0] on socket 14
[pcp-j-19:01004] [[26539,1],0] orte_tcp_peer_try_connect: attempting to connect 
to proc [[26539,0],0] on 172.16.0.119:43391 - 0 retries
[pcp-j-19:01004] [[26539,1],0] orte_tcp_peer_try_connect: attempting to connect 
to proc [[26539,0],0] on 172.18.0.119:43391 - 0 retries
[pcp-j-19:01003] [[26539,0],0] mca_oob_tcp_listen_thread: new connection: (16, 
0) 172.16.0.119:44249
[pcp-j-19:01003] mca_oob_tcp_accept: accept() failed: Error 0 (0).
[pcp-j-19:01003] [[26539,0],0] connection_handler: working connection (16, 11) 
172.16.0.119:44249
[pcp-j-19:01003] [[26539,0],0] accept_connection: 172.16.0.119:44249
[pcp-j-19:01004] [[26539,1],0] tcp:failed_to_connect called for peer 
[[26539,0],0]
[pcp-j-19:01004] [[26539,1],0] tcp:failed_to_connect unable to reach peer 
[[26539,0],0]
[pcp-j-19:01003] [[26539,0],0]:tcp:recv:handler called
[pcp-j-19:01003] [[26539,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 16
[pcp-j-19:01003] [[26539,0],0] waiting for connect ack from UNKNOWN
[pcp-j-19:01003] [[26539,0],0]-UNKNOWN tcp_peer_recv_blocking: peer closed 
connection: peer state 0
[pcp-j-19:01003] [[26539,0],0] unable to complete recv of connect-ack from 
UNKNOWN ON SOCKET 16
[pcp-j-19:01003] [[26539,0],0] TCP SHUTDOWN
[pcp-j-19:01003] mca: base: close: component tcp closed
[pcp-j-19:01003] mca: base: close: unloading component tcp

Reply via email to