Hi all,
We have a MT27500 (Mellanox) Family [ConnectX-3] adapter installed in our
computer. I followed the wiki https://wiki.debian.org/RDMA. It seems to be
working fine at least through RDMA. You see below required modules are loaded,
port is active, info about other clients on the network can be retrieved, and
`ibping` is working.
We also use the IPoIB interface as well, though, which is up and running on the
other clients. However, for some reason, can't bring up that interface on this
computer. (See below after `ibping` command). After running the `ip addr add`
command, it assigns the address to the interface, but it remains DOWN. `ifup
ibs3` says unknown interface?
Also don't understand why `ib0` got renamed to `ibs3` and `ib1` to `ibs3d1`.
Further, lshw is reporting this network as DISABLED?
Oh my gosh you all... ok, after writing this whole report, I decided to try to
delete the address assigned by `ip` command and define the interface in
/etc/network/interfaces. Now `ifup ibs3` doesn't complain about unknown
interface and it's also reported as UP now, with ping working to another
computer.
So, I guess the statements on the wiki after running the `ip addr add` command are
incorrect? ("The IP address should now respond to pings. If there are other hosts
configured with IPoIB, each interface's addresses should also be pingable.")
Maybe there is something missing from the `ib addr add` command?
Best,
Chandler
# lsmod | grep '\(^ib\|^rdma\)'
ib_umad 36864 0
ib_ipoib 147456 0
rdma_ucm 32768 0
rdma_cm 131072 2 rpcrdma,rdma_ucm
ib_cm 135168 2 rdma_cm,ib_ipoib
ib_uverbs 167936 2 mlx4_ib,rdma_ucm
ib_core 413696 9
rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,ib_cm
# ibstat
CA 'mlx4_0'
CA type: MT4099, Number of ports: 2, Firmware version: 2.40.7000,Hardware
version: 1,Node GUID: 0xe41d2d03006f8510,System image GUID: 0xe41d2d03006f8513
Port 1: *State: Active*, *Physical state: LinkUp*,Rate: 40 (FDR10),Base lid:
4,LMC: 0,SM lid: 13,Capability mask: 0x02514868,Port GUID:
0xe41d2d03006f8511,Link layer: InfiniBand
Port 2:State: Down,Physical state: Polling,Rate: 10,Base lid: 0,LMC: 0,SM lid:
0,Capability mask: 0x02514868,Port GUID: 0xe41d2d03006f8512,Link layer:
InfiniBand
# iblinkinfo
[prints info about other clients]
Switch: 0x0002c903008995b0 SwitchX - Mellanox Technologies:
[prints all the clients connected to each port on the switch]
8 35[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==> 4 1[ ]
"Xba mlx4_0" ( )
[...]
CA: Xba mlx4_0:
0xe41d2d03006f8511 4 1[ ] ==( 4X 10.0 Gbps (FDR10) Active/ LinkUp)==>
8 35[ ] "SwitchX - Mellanox Technologies" ( )
#
On another computer on the network:
# ibhosts
Ca : 0xe41d2d03006f8510 ports 2 "Xba mlx4_0"
[...]
# ibping -G 0xe41d2d03006f8511
Pong from Xba.(none) (Lid 4): time 0.107 ms
Pong from Xba.(none) (Lid 4): time 0.125 ms
Pong from Xba.(none) (Lid 4): time 0.117 ms
Pong from Xba.(none) (Lid 4): time 0.104 ms
Pong from Xba.(none) (Lid 4): time 0.109 ms
Pong from Xba.(none) (Lid 4): time 0.108 ms
^C
--- Xba.(none) (Lid 4) ibping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5098 ms
rtt min/avg/max = 0.104/0.111/0.125 ms
Back on this computer:
# ip addr
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default
qlen 256
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default
qlen 256
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0d1
# ip link
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT
group default qlen 256
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT
group default qlen 256
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0d1
# ip addr add 10.10.11.203/24 dev ibs3
# ip addr
[...]
4: ibs3: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default
qlen 256
link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:11
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0
inet 10.10.11.203/24 scope global ibs3
valid_lft forever preferred_lft forever
5: ibs3d1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN group default
qlen 256
link/infiniband 80:00:02:09:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:6f:85:12
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
altname ibp1s0d1
# ifup ibs3
ifup: unknown interface ibs3
# dmesg -T|grep mlx4
[Sat Jan 7 22:12:31 2023] mlx4_core: Mellanox ConnectX core driver v4.0-0
[Sat Jan 7 22:12:31 2023] mlx4_core: Initializing 0000:01:00.0
[Sat Jan 7 22:12:38 2023] mlx4_core 0000:01:00.0: DMFS high rate steer mode
is: disabled performance optimized steering
[Sat Jan 7 22:12:38 2023] mlx4_core 0000:01:00.0: 63.008 Gb/s available PCIe
bandwidth (8.0 GT/s PCIe x8 link)
[Sat Jan 7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX
InfiniBand driver v4.0-0
[Sat Jan 7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1
allocated 0
[Sat Jan 7 22:12:38 2023] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2
allocated 0
[Thu Jan 12 03:42:35 2023] mlx4_core 0000:01:00.0 ibs3: renamed from ib0
[Thu Jan 12 03:42:35 2023] mlx4_core 0000:01:00.0 ibs3d1: renamed from ib1
# lshw -class network
*-network DISABLED
description: interface
product: MT27500 Family [ConnectX-3]
vendor: Mellanox Technologies
physical id: 0
bus info: pci@0000:01:00.0
logical name: ibs3
version: 00
serial: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:00:00:00:00:00:00
width: 64 bits
clock: 33MHz
capabilities: pm vpd msix pciexpress bus_master cap_list rom physical
configuration: autonegotiation=on broadcast=yes driver=ib_ipoib
driverversion=5.10.0-20-amd64 duplex=full firmware=2.40.7000 ip=10.10.11.203
latency=0 link=no multicast=yes
resources: irq:24 memory:fb100000-fb1fffff memory:fa800000-faffffff
memory:fb000000-fb0fffff
[...]
# ip addr del 10.10.11.203/24 dev ibs3
# ping 10.10.11.100
PING 10.10.11.100 (10.10.11.100) 56(84) bytes of data.
64 bytes from 10.10.11.100: icmp_seq=1 ttl=64 time=1.50 ms
64 bytes from 10.10.11.100: icmp_seq=2 ttl=64 time=0.184 ms
64 bytes from 10.10.11.100: icmp_seq=3 ttl=64 time=0.178 ms
^C
--- 10.10.11.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2006ms
rtt min/avg/max/mdev = 0.178/0.619/1.495/0.619 ms
# lshw -class network
*-network
description: interface
product: MT27500 Family [ConnectX-3]
[...]
#