Hi, OVS teams,

Here's an issue about the coordination of using OVS and RoCEv2.

In the usual OVS setup, we will create an OVS bridge, add network device
Port to the bridge, and then assign IP on the default internal Port of the
bridge for communication.

And in the usual RoCEv2 setup, we will assign IP on the network interface
(whose backed network device should be RDMA-capable) for RoCEv2
communication.

The question is when we link the RDMA-capable network device to the OVS
bridge, RoCEv2 communication cannot work anymore. While if we use the Linux
bridge, RoCEv2 is working as expected. Let's see what is the difference
between the Linux bridge and OVS bridge when working with RoCEv2.

* For Linux bridge, as shown below, when we assign IP on the bridge
'mybr0', we can see four RDMA GIDs are created for 'mybro0', and we can use
the GID index 3 for RoCEv2 communication.

[smartx@node213 10:11:19 ~]$sudo brctl addbr mybr0
[smartx@node213 10:11:21 ~]$sudo brctl addif mybr0 em2  <<< em2 is a
RDMA-capable network interface
[smartx@node213 10:12:05 ~]$sudo ifconfig mybr0 19.19.67.213/24 up
[smartx@node213 10:12:07 ~]$show_gids
DEV PORT INDEX GID IPv4   VER DEV
--- ---- ----- --- ------------   --- ---
rocex506b4b0300c0cbf9 1 0 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 em2
rocex506b4b0300c0cbf9 1 1 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 em2
rocex506b4b0300c0cbf9 1 2 0000:0000:0000:0000:0000:ffff:1313:43d5
19.19.67.213   v1 mybr0
rocex506b4b0300c0cbf9 1 3 0000:0000:0000:0000:0000:ffff:1313:43d5
19.19.67.213   v2 mybr0  <<< this GID can be used for RoCEv2 communication
rocex506b4b0300c0cbf9 1 4 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 mybr0
rocex506b4b0300c0cbf9 1 5 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 mybr0
n_gids_found=6

And we can see how the RDMA GIDs are created. When ib_core module is
inited, it will use register_inetaddr_notifier to register 'inetaddr_event'
callback.
When an IP is added to a network interface, the callback will be invoked,
and it will check if the corresponding net_device of the network interface
is upper device of any RDMA-capable  net_deivce. If so, it will add GIDs
for the network interface (see drivers/infiniband/core/roce_gid_mgmt.c:
is_eth_port_of_netdev_filter).

 0xffffffffc040c290 : inetaddr_event+0x0/0x70 [ib_core]
 0xffffffff8c4d5097 : notifier_call_chain+0x47/0x70 [kernel]
 0xffffffff8c4d57be : blocking_notifier_call_chain+0x3e/0x60 [kernel]
 0xffffffff8cb9dfca : __inet_insert_ifa+0x1ea/0x2c0 [kernel]
 0xffffffff8cb9ee5d : devinet_ioctl+0x1ed/0x6d0 [kernel]
 0xffffffff8cba0c43 : inet_ioctl+0x143/0x220 [kernel]
 0xffffffff8cace553 : sock_do_ioctl+0x43/0x140 [kernel]
 0xffffffff8caceb18 : sock_ioctl+0x1a8/0x300 [kernel]
 0xffffffff8c6cd9a4 : do_vfs_ioctl+0xa4/0x630 [kernel]
 0xffffffff8c6cdf90 : ksys_ioctl+0x60/0x90 [kernel]
 0xffffffff8c6cdfd6 : __x64_sys_ioctl+0x16/0x20 [kernel]
 0xffffffff8c4041cb : do_syscall_64+0x5b/0x1b0 [kernel]
 0xffffffff8ce000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]
 0xffffffff8ce000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]
(inexact)

 0xffffffffc04091df : add_modify_gid+0x12f/0x2b0 [ib_core]
 0xffffffffc040945d : __ib_cache_gid_add+0xfd/0x160 [ib_core]
 0xffffffffc040b942 : update_gid+0x72/0x90 [ib_core]
 0xffffffffc04072a6 : ib_enum_roce_netdev+0xd6/0xe0 [ib_core]
 0xffffffffc040732b : ib_enum_all_roce_netdevs+0x7b/0xd0 [ib_core]
 0xffffffffc040b7ca : update_gid_event_work_handler+0x2a/0x50 [ib_core]
 0xffffffff8c4cd537 : process_one_work+0x1a7/0x3b0 [kernel]
 0xffffffff8c4cdc50 : worker_thread+0x30/0x390 [kernel]
 0xffffffff8c4d34e2 : kthread+0x112/0x130 [kernel]
 0xffffffff8ce0023f : ret_from_fork+0x1f/0x40 [kernel]
 0xffffffff8ce0023f : ret_from_fork+0x1f/0x40 [kernel] (inexact)

* For OVS bridge, when we assign IP on the internal port 'myovsbr0', no
RDMA GIDs will be created for 'myovsbr0', since 'myovsbr0' is not the upper
device of em2.

[smartx@node213 10:18:08 ~]$sudo ovs-vsctl add-br myovsbr0
[smartx@node213 10:18:18 ~]$sudo ovs-vsctl add-port myovsbr0 em2
[smartx@node213 10:18:58 ~]$sudo ifconfig myovsbr0 19.19.67.213/24 up
[smartx@node213 10:19:17 ~]$show_gids
DEV PORT INDEX GID IPv4   VER DEV
--- ---- ----- --- ------------   --- ---
rocex506b4b0300c0cbf9 1 0 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 em2
rocex506b4b0300c0cbf9 1 1 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 em2
n_gids_found=2

For this case, I think the fix could be setting the 'myovsbr0' as the upper
device of 'em2' with netdev_upper_dev_link in the OVS datapath module.

Would you please share your opinions regarding this issue?

-- 
Thanks,
Jiewei Ke
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to