Hi all, I have a question about the main thread in ovs-vswitch.c. The following are the details, and any comments would be appreciated.
In ovs-vswitchd, netdev_linux_rxq_recv function [1] and handle_flow_stats_request function [2] are running on the same thread. The former is responsible for polling the tap interface, while the latter is responsible for ovs-ofctl dump-flows. The presence of these two functions on the same thread causes packet forwarding delays. As an example, let's assume a situation where several million flow entries are registered. When ovs-ofctl dump-flows command is executed, handle_flow_stats_request function will take several seconds and netdev_linux_rxq_recv function will stop executing for the same amount of time. As a result, packets from the tap interface would be delayed for several seconds. 14:58:19 tid:3115 uprobe netdev_linux_rxq_recv 14:58:19 tid:3115 uprobe netdev_linux_rxq_recv 14:58:19 tid:3115 uprobe netdev_linux_rxq_recv 14:58:19 tid:3115 uprobe netdev_linux_rxq_recv 14:58:19 tid:3115 uprobe netdev_linux_rxq_recv 14:58:19 tid:3115 uprobe handle_flow_stats_request # here, packet forwarding delays. 14:58:21 tid:3115 uretprobe handle_flow_stats_request 14:58:21 tid:3115 uprobe netdev_linux_rxq_recv 14:58:21 tid:3115 uprobe netdev_linux_rxq_recv 14:58:21 tid:3115 uprobe netdev_linux_rxq_recv 14:58:21 tid:3115 uprobe netdev_linux_rxq_recv 14:58:21 tid:3115 uprobe netdev_linux_rxq_recv 1. I think it would be better to separate these two functions into different threads. Is there any reason for running these two functions on the same thread? 2. If I have to deal with millions of lines of flow entries in my workload, is there any way to deal with this problem? Diagram ------- +-------- Physical Server -----------------------------------------------------------+ +--- physical switch ---+ | | | | | +----- OvS/DPDK --------------+ | | | | SRC of ping6 | | | | DST of ping6 | | int1 (tap interface)----|---br-ext (actions=normal)---|---enp94s0f1 (physical nic)-|-----|---port1 | | | | | | | | | br-acl (1.6M dummy flows) | | | | | | | | | | | +-----------------------------+ | | | +------------------------------------------------------------------------------------+ +-----------------------+ Setup ----- # start ovs ovs-ctl start # change log level to debug ovs-appctl vlog/set file::dbg ovs-appctl vlog/list # initialize dpdk ovs-vsctl set Open_vSwitch . other_config:dpdk-init="true" ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=f ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=f0 ovs-vsctl get Open_vSwitch . other_config ovs-vsctl get Open_vSwitch . dpdk_version ovs-vsctl get Open_vSwitch . dpdk_initialized ovs-vswitchd --version # ovs-vswitchd (Open vSwitch) 2.15.90, DPDK 21.05.0-rc0 # create 2 bridges ovs-vsctl --may-exist add-br br-acl -- set Bridge br-acl datapath_type=netdev ovs-vsctl --may-exist add-br br-ext -- set Bridge br-ext datapath_type=netdev # br-ext: add mellanox nic ovs-vsctl --may-exist add-port br-ext enp94s0f1 \ -- set Interface enp94s0f1 type=dpdk options:dpdk-devargs=0000:5e:00.1 # br-ext: add internal port (int1) ovs-vsctl add-port br-ext int1 -- set Interface int1 type=internal ip link set int1 up # br-ext: set actions=NORMAL ovs-ofctl dump-flows br-ext # actions=NORMAL # br-acl: add dummy ports ovs-vsctl add-port br-acl dummy1 -- set Interface dummy1 type=internal ovs-vsctl add-port br-acl dummy2 -- set Interface dummy2 type=internal # br-acl: add 1.6M dummy flows ovs-ofctl replace-flows br-acl ./1.6M-dummy-flow.txt # summary ovs-vsctl show # # 8838f149-f719-4243-a7f2-5b9aa179cb7e # Bridge br-acl # datapath_type: netdev # Port dummy1 # Interface dummy1 # type: internal # Port dummy2 # Interface dummy2 # type: internal # Port br-acl # Interface br-acl # type: internal # Bridge br-ext # datapath_type: netdev # Port br-ext # Interface br-ext # type: internal # Port int1 # Interface int1 # type: internal # Port enp94s0f1 # Interface enp94s0f1 # type: dpdk # options: {dpdk-devargs="0000:5e:00.1"} # ovs_version: "2.15.90" Reproduction Steps ------------------ # send out ICMP requests in the background. ping6 $(ip -6 neigh | awk '/int1/ {print $1}') -I int1 -D & # at the timing of executing this command, the delay can be observed in the results of the ping command. ovs-ofctl dump-flows br-acl | wc -l Latencies --------- For reference, here is the relationship between the number of flow entries and the latency. Number of flow entries in br-acl, Ping6 Latency (ms), 8 0.185 16 0.1691 32 0.1797 64 0.1631 128 0.1778 256 0.1672 512 0.1774 1024 0.1672 2048 0.176 4096 0.171 8192 4.4824 16384 16.3105 32768 40.4221 65536 89.9295 131072 193.9651 262144 404.6833 524288 824.7896 1048576 1647.5226 [1]: https://github.com/openvswitch/ovs/blob/f8be30acf2eb60d567bb7386b98f5cb58ddb9119/lib/netdev-linux.c#L1461 [2]: https://github.com/openvswitch/ovs/blob/f8be30acf2eb60d567bb7386b98f5cb58ddb9119/ofproto/ofproto.c#L4639 Best Regards, Nobuhiro Miki _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss