Hi, OVS teams, Here's an issue about the coordination of using OVS and RoCEv2.
In the usual OVS setup, we will create an OVS bridge, add network device Port to the bridge, and then assign IP on the default internal Port of the bridge for communication. And in the usual RoCEv2 setup, we will assign IP on the network interface (whose backed network device should be RDMA-capable) for RoCEv2 communication. The question is when we link the RDMA-capable network device to the OVS bridge, RoCEv2 communication cannot work anymore. While if we use the Linux bridge, RoCEv2 is working as expected. Let's see what is the difference between the Linux bridge and OVS bridge when working with RoCEv2. * For Linux bridge, as shown below, when we assign IP on the bridge 'mybr0', we can see four RDMA GIDs are created for 'mybro0', and we can use the GID index 3 for RoCEv2 communication. [smartx@node213 10:11:19 ~]$sudo brctl addbr mybr0 [smartx@node213 10:11:21 ~]$sudo brctl addif mybr0 em2 <<< em2 is a RDMA-capable network interface [smartx@node213 10:12:05 ~]$sudo ifconfig mybr0 19.19.67.213/24 up [smartx@node213 10:12:07 ~]$show_gids DEV PORT INDEX GID IPv4 VER DEV --- ---- ----- --- ------------ --- --- rocex506b4b0300c0cbf9 1 0 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 em2 rocex506b4b0300c0cbf9 1 1 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 em2 rocex506b4b0300c0cbf9 1 2 0000:0000:0000:0000:0000:ffff:1313:43d5 19.19.67.213 v1 mybr0 rocex506b4b0300c0cbf9 1 3 0000:0000:0000:0000:0000:ffff:1313:43d5 19.19.67.213 v2 mybr0 <<< this GID can be used for RoCEv2 communication rocex506b4b0300c0cbf9 1 4 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 mybr0 rocex506b4b0300c0cbf9 1 5 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 mybr0 n_gids_found=6 And we can see how the RDMA GIDs are created. When ib_core module is inited, it will use register_inetaddr_notifier to register 'inetaddr_event' callback. When an IP is added to a network interface, the callback will be invoked, and it will check if the corresponding net_device of the network interface is upper device of any RDMA-capable net_deivce. If so, it will add GIDs for the network interface (see drivers/infiniband/core/roce_gid_mgmt.c: is_eth_port_of_netdev_filter). 0xffffffffc040c290 : inetaddr_event+0x0/0x70 [ib_core] 0xffffffff8c4d5097 : notifier_call_chain+0x47/0x70 [kernel] 0xffffffff8c4d57be : blocking_notifier_call_chain+0x3e/0x60 [kernel] 0xffffffff8cb9dfca : __inet_insert_ifa+0x1ea/0x2c0 [kernel] 0xffffffff8cb9ee5d : devinet_ioctl+0x1ed/0x6d0 [kernel] 0xffffffff8cba0c43 : inet_ioctl+0x143/0x220 [kernel] 0xffffffff8cace553 : sock_do_ioctl+0x43/0x140 [kernel] 0xffffffff8caceb18 : sock_ioctl+0x1a8/0x300 [kernel] 0xffffffff8c6cd9a4 : do_vfs_ioctl+0xa4/0x630 [kernel] 0xffffffff8c6cdf90 : ksys_ioctl+0x60/0x90 [kernel] 0xffffffff8c6cdfd6 : __x64_sys_ioctl+0x16/0x20 [kernel] 0xffffffff8c4041cb : do_syscall_64+0x5b/0x1b0 [kernel] 0xffffffff8ce000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel] 0xffffffff8ce000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel] (inexact) 0xffffffffc04091df : add_modify_gid+0x12f/0x2b0 [ib_core] 0xffffffffc040945d : __ib_cache_gid_add+0xfd/0x160 [ib_core] 0xffffffffc040b942 : update_gid+0x72/0x90 [ib_core] 0xffffffffc04072a6 : ib_enum_roce_netdev+0xd6/0xe0 [ib_core] 0xffffffffc040732b : ib_enum_all_roce_netdevs+0x7b/0xd0 [ib_core] 0xffffffffc040b7ca : update_gid_event_work_handler+0x2a/0x50 [ib_core] 0xffffffff8c4cd537 : process_one_work+0x1a7/0x3b0 [kernel] 0xffffffff8c4cdc50 : worker_thread+0x30/0x390 [kernel] 0xffffffff8c4d34e2 : kthread+0x112/0x130 [kernel] 0xffffffff8ce0023f : ret_from_fork+0x1f/0x40 [kernel] 0xffffffff8ce0023f : ret_from_fork+0x1f/0x40 [kernel] (inexact) * For OVS bridge, when we assign IP on the internal port 'myovsbr0', no RDMA GIDs will be created for 'myovsbr0', since 'myovsbr0' is not the upper device of em2. [smartx@node213 10:18:08 ~]$sudo ovs-vsctl add-br myovsbr0 [smartx@node213 10:18:18 ~]$sudo ovs-vsctl add-port myovsbr0 em2 [smartx@node213 10:18:58 ~]$sudo ifconfig myovsbr0 19.19.67.213/24 up [smartx@node213 10:19:17 ~]$show_gids DEV PORT INDEX GID IPv4 VER DEV --- ---- ----- --- ------------ --- --- rocex506b4b0300c0cbf9 1 0 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v1 em2 rocex506b4b0300c0cbf9 1 1 fe80:0000:0000:0000:526b:4bff:fec0:cbf9 v2 em2 n_gids_found=2 For this case, I think the fix could be setting the 'myovsbr0' as the upper device of 'em2' with netdev_upper_dev_link in the OVS datapath module. Would you please share your opinions regarding this issue? -- Thanks, Jiewei Ke
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss