Hi Or, Thanks a lot for your quick response.
The nodes have LID's assigned to them and OpenSM is running fine. The reason why the test doesn't print out the LID's seems to be because the test does not print those fields properly when using RDMA_CM for establishing connections. I've attached the configurations of the two hosts along with this e-mail. As Jonathan mentioned, we are able to ping between them. The issue is intermittent. It happens at times and at other times, things work fine. Please let us know if you need any more information. Thx, Hari. On Thu, 22 Jul 2010, Jonathan Perkins wrote: > On Thu, Jul 22, 2010 at 3:15 AM, Or Gerlitz <ogerl...@voltaire.com> wrote: > > Hari Subramoni wrote: > >> [subra...@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5 > >> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | > >> duplex=0 | cma=1 | > >> 11928: Local address: ?LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 > >> VAddr 0x002b27feabe000 > >> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 > >> VAddr 0x002b9d5c93b000 > > > > > > you can see the lid and qp numbers are zero, something is broken... when > > you use the rdma-cm, > > the address to be provided to the utility should be on an IPoIB subnet, is > > that what you're doing? > > > > Basically, I would suggest that you first use rping(1) provided by > > librdmacm-utils to make > > sure things are working well in your configuration and then move to the > > perftest utils. > > Thanks for the response Or. I'm posting some information below. > > Here is the output I get when running rping... > > [perki...@amd5 ~]$ rping -v -s -a 172.16.1.5 > > [perki...@amd6 ~]$ rping -v -c -a 172.16.1.5 > cq completion failed status 5 > cma event RDMA_CM_EVENT_REJECTED, error 8 > wait for CONNECTED state 10 > connect error -1 > [perki...@amd6 ~]$ ping 172.16.1.5 > PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data. > 64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=3.45 ms > 64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=1.00 ms > > We are able to ping the addresses but you can see that rping results > in a failure. > > We have two interfaces exposed on each machine both on different > subnets (172.16.1.0/24 and 172.16.2.0/24). We're using ofed-1.5.1 on > these systems. Any idea of what could be going on? > > -- > Jonathan Perkins >
[subra...@amd6 ~]$ ibstat CA 'mlx4_0' CA type: MT25418 Number of ports: 2 Firmware version: 2.6.0 Hardware version: a0 Node GUID: 0x0002c9030001e442 System image GUID: 0x0002c9030001e445 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 4 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e443 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e444 CA 'mlx4_1' CA type: MT25418 Number of ports: 2 Firmware version: 2.6.0 Hardware version: a0 Node GUID: 0x0002c9030001e44e System image GUID: 0x0002c9030001e451 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 6 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e44f Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e450 [subra...@amd6 ~]$ [subra...@amd6 ~]$ ifconfig eth0 Link encap:Ethernet HWaddr 00:30:48:D0:19:CA inet addr:164.107.119.237 Bcast:164.107.119.255 Mask:255.255.255.0 inet6 addr: fe80::230:48ff:fed0:19ca/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:132741 errors:0 dropped:0 overruns:0 frame:0 TX packets:51091 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:25346771 (24.1 MiB) TX bytes:18800740 (17.9 MiB) Base address:0xbc00 Memory:d7fe0000-d8000000 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.1.6 Bcast:172.16.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e443/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:121 errors:0 dropped:0 overruns:0 frame:0 TX packets:66 errors:0 dropped:10 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:34885 (34.0 KiB) TX bytes:13913 (13.5 KiB) ib2 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.2.6 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e44f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:76 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:10 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:24775 (24.1 KiB) TX bytes:15327 (14.9 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:7870 errors:0 dropped:0 overruns:0 frame:0 TX packets:7870 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:855574 (835.5 KiB) TX bytes:855574 (835.5 KiB) virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:28 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:6175 (6.0 KiB) [subra...@amd6 ~]$ [subra...@amd6 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 172.16.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib2 164.107.119.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 164.107.119.1 0.0.0.0 UG 0 0 0 eth0 [subra...@amd6 ~]$ [subra...@amd6 ~]$ ping 172.16.1.5 PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data. 64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=2.31 ms 64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=0.109 ms 64 bytes from 172.16.1.5: icmp_seq=3 ttl=64 time=0.078 ms --- 172.16.1.5 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.078/0.834/2.315/1.047 ms [subra...@amd6 ~]$ ping 172.16.1.6 PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data. 64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=0.046 ms 64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.013 ms 64 bytes from 172.16.1.6: icmp_seq=3 ttl=64 time=0.014 ms --- 172.16.1.6 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.013/0.024/0.046/0.015 ms [subra...@amd6 ~]$
[subra...@amd5 ~]$ ibstat CA 'mlx4_0' CA type: MT25418 Number of ports: 2 Firmware version: 2.6.0 Hardware version: a0 Node GUID: 0x0002c9030001e386 System image GUID: 0x0002c9030001e389 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 12 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e387 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e388 CA 'mlx4_1' CA type: MT25418 Number of ports: 2 Firmware version: 2.6.0 Hardware version: a0 Node GUID: 0x0002c9030001e452 System image GUID: 0x0002c9030001e455 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 7 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e453 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030001e454 [subra...@amd5 ~]$ ifconfig eth0 Link encap:Ethernet HWaddr 00:30:48:D0:19:BE inet addr:164.107.119.236 Bcast:164.107.119.255 Mask:255.255.255.0 inet6 addr: fe80::230:48ff:fed0:19be/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:238196 errors:0 dropped:0 overruns:0 frame:0 TX packets:172491 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:62341710 (59.4 MiB) TX bytes:94768875 (90.3 MiB) Base address:0xbc00 Memory:d7fe0000-d8000000 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.1.5 Bcast:172.16.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e387/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:121 errors:0 dropped:0 overruns:0 frame:0 TX packets:78 errors:0 dropped:13 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:47421 (46.3 KiB) TX bytes:21533 (21.0 KiB) ib2 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:172.16.2.5 Bcast:172.16.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:1:e453/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:120 errors:0 dropped:0 overruns:0 frame:0 TX packets:45 errors:0 dropped:13 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:40365 (39.4 KiB) TX bytes:13567 (13.2 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8506 errors:0 dropped:0 overruns:0 frame:0 TX packets:8506 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:941478 (919.4 KiB) TX bytes:941478 (919.4 KiB) virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:38 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:7969 (7.7 KiB) [subra...@amd5 ~]$ [subra...@amd5 ~]$ [subra...@amd5 ~]$ ping 172.16.1.6 PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data. 64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=2.23 ms 64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.111 ms --- 172.16.1.6 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.111/1.172/2.234/1.062 ms [subra...@amd5 ~]$ ping 172.16.2.6 PING 172.16.2.6 (172.16.2.6) 56(84) bytes of data. 64 bytes from 172.16.2.6: icmp_seq=1 ttl=64 time=1.70 ms 64 bytes from 172.16.2.6: icmp_seq=2 ttl=64 time=0.104 ms 64 bytes from 172.16.2.6: icmp_seq=3 ttl=64 time=0.083 ms --- 172.16.2.6 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.083/0.631/1.707/0.760 ms [subra...@amd5 ~]$ [subra...@amd5 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 172.16.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib2 164.107.119.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 164.107.119.1 0.0.0.0 UG 0 0 0 eth0 [subra...@amd5 ~]$