Hi Or,

Thanks a lot for your quick response.

The nodes have LID's assigned to them and OpenSM is running fine. The
reason why the test doesn't print out the LID's seems to be because the
test does not print those fields properly when using RDMA_CM for
establishing connections. I've attached the configurations of the two
hosts along with this e-mail. As Jonathan mentioned, we are able to ping
between them.

The issue is intermittent. It happens at times and at other times, things
work fine. Please let us know if you need any more information.

Thx,
Hari.

On Thu, 22 Jul 2010, Jonathan Perkins wrote:

> On Thu, Jul 22, 2010 at 3:15 AM, Or Gerlitz <ogerl...@voltaire.com> wrote:
> > Hari Subramoni wrote:
> >> [subra...@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
> >> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | 
> >> duplex=0 | cma=1 |
> >> 11928: Local address: ?LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 
> >> VAddr 0x002b27feabe000
> >> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 
> >> VAddr 0x002b9d5c93b000
> >
> >
> > you can see the lid and qp numbers are zero, something is broken... when 
> > you use the rdma-cm,
> > the address to be provided to the utility should be on an IPoIB subnet, is 
> > that what you're doing?
> >
> > Basically, I would suggest that you first use rping(1) provided by 
> > librdmacm-utils to make
> > sure things are working well in your configuration and then move to the 
> > perftest utils.
>
> Thanks for the response Or.  I'm posting some information below.
>
> Here is the output I get when running rping...
>
> [perki...@amd5 ~]$ rping -v -s -a 172.16.1.5
>
> [perki...@amd6 ~]$ rping -v -c -a 172.16.1.5
> cq completion failed status 5
> cma event RDMA_CM_EVENT_REJECTED, error 8
> wait for CONNECTED state 10
> connect error -1
> [perki...@amd6 ~]$ ping 172.16.1.5
> PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data.
> 64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=3.45 ms
> 64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=1.00 ms
>
> We are able to ping the addresses but you can see that rping results
> in a failure.
>
> We have two interfaces exposed on each machine both on different
> subnets (172.16.1.0/24 and 172.16.2.0/24).  We're using ofed-1.5.1 on
> these systems.  Any idea of what could be going on?
>
> --
> Jonathan Perkins
>
[subra...@amd6 ~]$ ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e442
        System image GUID: 0x0002c9030001e445
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 4
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e443
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e444
CA 'mlx4_1'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e44e
        System image GUID: 0x0002c9030001e451
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 6
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e44f
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e450
[subra...@amd6 ~]$
[subra...@amd6 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:D0:19:CA
          inet addr:164.107.119.237  Bcast:164.107.119.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fed0:19ca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:132741 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:25346771 (24.1 MiB)  TX bytes:18800740 (17.9 MiB)
          Base address:0xbc00 Memory:d7fe0000-d8000000

ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.6  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e443/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:121 errors:0 dropped:0 overruns:0 frame:0
          TX packets:66 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:34885 (34.0 KiB)  TX bytes:13913 (13.5 KiB)

ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.6  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e44f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:76 errors:0 dropped:0 overruns:0 frame:0
          TX packets:48 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:24775 (24.1 KiB)  TX bytes:15327 (14.9 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:7870 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7870 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:855574 (835.5 KiB)  TX bytes:855574 (835.5 KiB)

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:6175 (6.0 KiB)

[subra...@amd6 ~]$
[subra...@amd6 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0
[subra...@amd6 ~]$
[subra...@amd6 ~]$ ping 172.16.1.5
PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data.
64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=2.31 ms
64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from 172.16.1.5: icmp_seq=3 ttl=64 time=0.078 ms

--- 172.16.1.5 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.078/0.834/2.315/1.047 ms
[subra...@amd6 ~]$ ping 172.16.1.6
PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data.
64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.013 ms
64 bytes from 172.16.1.6: icmp_seq=3 ttl=64 time=0.014 ms

--- 172.16.1.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.013/0.024/0.046/0.015 ms
[subra...@amd6 ~]$
[subra...@amd5 ~]$ ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e386
        System image GUID: 0x0002c9030001e389
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 12
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e387
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e388
CA 'mlx4_1'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e452
        System image GUID: 0x0002c9030001e455
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 7
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e453
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e454
[subra...@amd5 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:D0:19:BE
          inet addr:164.107.119.236  Bcast:164.107.119.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fed0:19be/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:238196 errors:0 dropped:0 overruns:0 frame:0
          TX packets:172491 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62341710 (59.4 MiB)  TX bytes:94768875 (90.3 MiB)
          Base address:0xbc00 Memory:d7fe0000-d8000000

ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.5  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e387/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:121 errors:0 dropped:0 overruns:0 frame:0
          TX packets:78 errors:0 dropped:13 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:47421 (46.3 KiB)  TX bytes:21533 (21.0 KiB)

ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.5  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e453/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:120 errors:0 dropped:0 overruns:0 frame:0
          TX packets:45 errors:0 dropped:13 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:40365 (39.4 KiB)  TX bytes:13567 (13.2 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8506 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8506 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:941478 (919.4 KiB)  TX bytes:941478 (919.4 KiB)

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:7969 (7.7 KiB)

[subra...@amd5 ~]$
[subra...@amd5 ~]$
[subra...@amd5 ~]$ ping 172.16.1.6
PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data.
64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=2.23 ms
64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.111 ms

--- 172.16.1.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.111/1.172/2.234/1.062 ms
[subra...@amd5 ~]$ ping 172.16.2.6
PING 172.16.2.6 (172.16.2.6) 56(84) bytes of data.
64 bytes from 172.16.2.6: icmp_seq=1 ttl=64 time=1.70 ms
64 bytes from 172.16.2.6: icmp_seq=2 ttl=64 time=0.104 ms
64 bytes from 172.16.2.6: icmp_seq=3 ttl=64 time=0.083 ms

--- 172.16.2.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.083/0.631/1.707/0.760 ms
[subra...@amd5 ~]$
[subra...@amd5 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0
[subra...@amd5 ~]$

Reply via email to