On Mon, 26 Jul 2010, Hari Subramoni wrote: > Hi, > > Yes, both cards are on the same IB subnet. > > The machines are down for maintanence now. We will send out the > information you requested as soon as they are up. > > Thanks a lot, > Hari. > > On Fri, 23 Jul 2010, Or Gerlitz wrote: > > > Hari Subramoni <subra...@cse.ohio-state.edu> wrote: > > > > > The nodes have LID's assigned to them and OpenSM is running fine. > > > I've attached the configurations of the two hosts along with this e-mail. > > > As Jonathan mentioned, we are able to ping between them. > > > > are the two HCAs on each of the nodes connected to the same IB subnet? > > > > > The issue is intermittent. It happens at times and at other times, things > > > work fine. Please let us know if you need any more information. > > > > lets focus on rping, please use both -v -d flags with rping, also > > when rping fails, please send the neighbours info (#ip neigh show) > > from host .5 > > > > Or. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Hi Or, Sorry for the delay. I as able to reproduce the error after a few attempts. The details are given below. The systems have OFED-1.5.1. OpenSM is running and the interfaces are up and active. Host 1 ====== [subra...@amd5 exp2-amd5-install]$ rping -vVd -s -C 1 -a 172.16.2.5 server count 1 created cm_id 0x5ed7550 rdma_bind_addr successful rdma_listen [subra...@amd5 exp2-amd5-install]$ ping 172.16.2.6 PING 172.16.2.6 (172.16.2.6) 56(84) bytes of data. 64 bytes from 172.16.2.6: icmp_seq=1 ttl=64 time=0.169 ms 64 bytes from 172.16.2.6: icmp_seq=2 ttl=64 time=0.104 ms --- 172.16.2.6 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.104/0.136/0.169/0.034 ms [subra...@amd5 exp2-amd5-install]$ ip neigh show 172.16.2.6 dev ib2 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:4f REACHABLE 164.107.119.153 dev eth0 lladdr 00:15:17:0f:a8:28 REACHABLE 164.107.119.237 dev eth0 lladdr 00:30:48:d0:19:ca STALE 164.107.119.1 dev eth0 lladdr 00:21:59:85:b0:06 REACHABLE [subra...@amd5 exp2-amd5-install]$ [subra...@amd5 exp2-amd5-install]$ ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.1.5 Bcast:172.16.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e387/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:75 errors:0 dropped:0 overruns:0 frame:0 TX packets:25 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:29059 (28.3 KiB) TX bytes:9176 (8.9 KiB) [subra...@amd5 exp2-amd5-install]$ ifconfig ib2 ib2 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.2.5 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e453/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:36 errors:0 dropped:0 overruns:0 frame:0 TX packets:23 errors:0 dropped:10 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:11455 (11.1 KiB) TX bytes:8001 (7.8 KiB) [subra...@amd5 exp2-amd5-install]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 172.16.4.0 0.0.0.0 255.255.255.0 U 0 0 0 ib1 172.16.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib2 164.107.119.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 164.107.119.1 0.0.0.0 UG 0 0 0 eth0 Host 2 ====== [subra...@amd6 ~]$ rping -vVdc -a 172.16.2.5 -C 1 client count 1 created cm_id 0xe017550 cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0xe017550 (parent) cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0xe017550 (parent) rdma_resolve_addr - rdma_resolve_route successful created pd 0xe017a10 created channel 0xe017a30 created cq 0xe017a50 created qp 0xe017b90 rping_setup_buffers called on cb 0xe011010 allocated & registered buffers... cq_thread started. cq completion failed status 5 cma_event type RDMA_CM_EVENT_REJECTED cma_id 0xe017550 (parent) wait for CONNECTED state 10 connect error -1 rping_free_buffers called on cb 0xe011010 cma event RDMA_CM_EVENT_REJECTED, error 8 [subra...@amd6 ~]$ ping 172.16.2.5 PING 172.16.2.5 (172.16.2.5) 56(84) bytes of data. 64 bytes from 172.16.2.5: icmp_seq=1 ttl=64 time=3.04 ms 64 bytes from 172.16.2.5: icmp_seq=2 ttl=64 time=1.09 ms --- 172.16.2.5 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 1.097/2.071/3.045/0.974 ms [subra...@amd6 ~]$ [subra...@amd6 ~]$ [subra...@amd6 ~]$ ip neigh show 164.107.119.1 dev eth0 lladdr 00:21:59:85:b0:06 REACHABLE 172.16.2.5 dev ib2 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:53 REACHABLE 164.107.119.236 dev eth0 lladdr 00:30:48:d0:19:be STALE 172.16.2.5 dev ib0 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:53 STALE 172.16.1.5 dev ib0 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e3:87 STALE 164.107.119.153 dev eth0 lladdr 00:15:17:0f:a8:28 REACHABLE [subra...@amd6 ~]$ [subra...@amd6 ~]$ ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.1.6 Bcast:172.16.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e443/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:61 errors:0 dropped:0 overruns:0 frame:0 TX packets:25 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:20049 (19.5 KiB) TX bytes:8263 (8.0 KiB) [subra...@amd6 ~]$ ifconfig ib2 ib2 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.2.6 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e44f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:23 errors:0 dropped:9 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:1740 (1.6 KiB) TX bytes:8341 (8.1 KiB) [subra...@amd6 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 172.16.4.0 0.0.0.0 255.255.255.0 U 0 0 0 ib1 172.16.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib2 164.107.119.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 164.107.119.1 0.0.0.0 UG 0 0 0 eth0 Thx, Hari. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html