Hi!
I work on the FRRouting project (https://frrouting/org ) and am doing work
with FRR and have noticed that when I have a full BGP feed on a system that
is also running ovs-vswitchd that ovs-vswitchd sits at 100% cpu:
top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, 1.08, 1.08
Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 si,
0.0 st
MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
730 root 10 -10 146204 146048 11636 R 98.3 1.8 6998:13
ovs-vswitchd
169620 root 20 0 0 0 0 I 3.3 0.0 1:34.83
kworker/0:3-events
21 root 20 0 0 0 0 S 1.3 0.0 14:09.59
ksoftirqd/1
131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 21:57.51
zebra
131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 18:16.17
bgpd
When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running
at 100%:
top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, 0.60, 0.89
Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si,
0.0 st
MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 0:00.09
top
1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 19:09.17
zerotier-one
1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 22:04.42
snmpd
178480 root 20 0 0 0 0 I 0.3 0.0 0:01.21
kworker/1:2-events
178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 0:00.10
sshd
1 root 20 0 169872 13140 8272 S 0.0 0.2 2:33.26
systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.60
kthreadd
I do not have any particular ovs configuration on this box:
sharpd@janelle:~$ sudo ovs-vsctl show
c72d327c-61eb-4877-b4e7-dcf7e07e24fc
ovs_version: "2.13.8"
sharpd@janelle:~$ sudo ovs-vsctl list o .
_uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
bridges : []
cur_cfg : 0
datapath_types : [netdev, system]
datapaths : {}
db_version : "8.2.0"
dpdk_initialized : false
dpdk_version : none
external_ids : {hostname=janelle, rundir="/var/run/openvswitch",
system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre,
lisp, patch, stt, system, tap, vxlan]
manager_options : []
next_cfg : 0
other_config : {}
ovs_version : "2.13.8"
ssl : []
statistics : {}
system_type : ubuntu
system_version : "20.04"
sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
ovs-vswitchd: no datapaths exist
ovs-vswitchd: datapath not found (Invalid argument)
ovs-appctl: ovs-vswitchd: server returned an error
Eli Britstein suggested I update ovs-openvswitch to latest and I did and
saw the same behavior. When I pulled up the running code in a debugger I
see
that ovs-vswitchd is running in this loop below pretty much 100% of the
time:
(gdb) f 4
#4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
133 nln_run(nln);
(gdb) l
128 OVS_EXCLUDED(route_table_mutex)
129 {
130 ovs_mutex_lock(&route_table_mutex);
131 if (nln) {
132 rtnetlink_run();
133 nln_run(nln);
134
135 if (!route_table_valid) {
136 route_table_reset();
137 }
(gdb) l
138 }
139 ovs_mutex_unlock(&route_table_mutex);
140 }
I pulled up where route_table_valid is set:
298 static void
299 route_table_change(const struct route_table_msg *change
OVS_UNUSED,
300 void *aux OVS_UNUSED)
301 {
302 route_table_valid = false;
303 }
If I am reading the code correctly, every RTM_NEWROUTE netlink message that
ovs-vswitchd is getting
is setting the route_table_valid global variable to false and causing
route_table_reset() to be run.
This makes sense in context of what FRR is doing. A full BGP feed *always*
has churn. So ovs-vswitchd
is receiving. RTM_NEWROUTE message, parsing it and deciding in
route_table_change() that the
route table is no longer valid and causing it to call route_table_reset()
which redumps the entire
routing table to ovs-vswitchd. In this case there are ~115k ipv6 routes in
the linux fib.
I hesitate to make any changes here since I really don't understand what
the end goal here is.
ovs-vswitchd is receiving a route change from the kernel but is in turn
causing it to redump the entire
routing table again. What should be the correct behavior be from
ovs-vswitchd's perspective here?
As a note, I recompiled and set line 302 to true above and cpu usage of
ovs-vswitchd pretty much stays
at 0% once the initial table read has been done.
thanks!
donald
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss