On 11/1/22 10:50, Eli Britstein wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets <[email protected]>
>> Sent: Monday, 31 October 2022 23:54
>> To: Donald Sharp <[email protected]>; ovs-
>> [email protected]; [email protected]; Eli Britstein
>> <[email protected]>
>> Cc: [email protected]
>> Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/31/22 17:25, Donald Sharp via discuss wrote:
>>> Hi!
>>>
>>> I work on the FRRouting project (https://frrouting/org
>> <https://frrouting/org> ) and am doing work with FRR and have noticed that
>> when I have a full BGP feed on a system that is also running ovs-vswitchd 
>> that
>> ovs-vswitchd sits at 100% cpu:
>>>
>>> top - 09:43:12 up 4 days, 22:53,  3 users,  load average: 1.06, 1.08, 1.08
>>> Tasks: 188 total,   3 running, 185 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s): 12.3 us, 14.7 sy,  0.0 ni, 72.8 id,  0.0 wa,  0.0 hi,  0.2 si,  
>>> 0.0 st
>>> MiB Mem :   7859.3 total,   2756.5 free,   2467.2 used,   2635.6 buff/cache
>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   5101.9 avail Mem
>>>
>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
>>> COMMAND
>>>     730 root      10 -10  146204 146048  11636 R  98.3   1.8   6998:13 
>>> ovs-vswitchd
>>>  169620 root      20   0       0      0      0 I   3.3   0.0   1:34.83 
>>> kworker/0:3-events
>>>      21 root      20   0       0      0      0 S   1.3   0.0  14:09.59 
>>> ksoftirqd/1
>>>  131734 frr       15  -5 2384292 609556   6612 S   1.0   7.6  21:57.51 zebra
>>>  131739 frr       15  -5 1301168   1.0g   7420 S   1.0  13.3  18:16.17 bgpd
>>>
>>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops running
>> at 100%:
>>>
>>> top - 09:48:12 up 4 days, 22:58,  3 users,  load average: 0.08, 0.60, 0.89
>>> Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1 si,  
>>> 0.0 st
>>> MiB Mem :   7859.3 total,   4560.6 free,    663.1 used,   2635.6 buff/cache
>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6906.1 avail Mem
>>>
>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
>>> COMMAND
>>>  179064 sharpd    20   0   11852   3816   3172 R   1.0   0.0   0:00.09 top
>>>    1037 zerotie+  20   0  291852 113180   7408 S   0.7   1.4  19:09.17 
>>> zerotier-one
>>>    1043 Debian-+  20   0   34356  21988   7588 S   0.3   0.3  22:04.42 snmpd
>>>  178480 root      20   0       0      0      0 I   0.3   0.0   0:01.21 
>>> kworker/1:2-events
>>>  178622 sharpd    20   0   14020   6364   4872 S   0.3   0.1   0:00.10 sshd
>>>       1 root      20   0  169872  13140   8272 S   0.0   0.2   2:33.26 
>>> systemd
>>>       2 root      20   0       0      0      0 S   0.0   0.0   0:00.60 
>>> kthreadd
>>>
>>> I do not have any particular ovs configuration on this box:
>>> sharpd@janelle:~$ sudo ovs-vsctl show
>>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>>>     ovs_version: "2.13.8"
>>>
>>>
>>> sharpd@janelle:~$ sudo ovs-vsctl list o .
>>> _uuid               : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>>> bridges             : []
>>> cur_cfg             : 0
>>> datapath_types      : [netdev, system]
>>> datapaths           : {}
>>> db_version          : "8.2.0"
>>> dpdk_initialized    : false
>>> dpdk_version        : none
>>> external_ids        : {hostname=janelle, rundir="/var/run/openvswitch",
>> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
>>> iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, 
>>> lisp,
>> patch, stt, system, tap, vxlan]
>>> manager_options     : []
>>> next_cfg            : 0
>>> other_config        : {}
>>> ovs_version         : "2.13.8"
>>> ssl                 : []
>>> statistics          : {}
>>> system_type         : ubuntu
>>> system_version      : "20.04"
>>>
>>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
>>> ovs-vswitchd: no datapaths exist
>>> ovs-vswitchd: datapath not found (Invalid argument)
>>> ovs-appctl: ovs-vswitchd: server returned an error
>>>
>>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
>>> and saw the same behavior.  When I pulled up the running code in a
>> debugger I see that ovs-vswitchd is running in this loop below pretty much
>> 100% of the time:
>>>
>>> (gdb) f 4
>>> #4  0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
>>> 133                 nln_run(nln);
>>> (gdb) l
>>> 128             OVS_EXCLUDED(route_table_mutex)
>>> 129         {
>>> 130             ovs_mutex_lock(&route_table_mutex);
>>> 131             if (nln) {
>>> 132                 rtnetlink_run();
>>> 133                 nln_run(nln);
>>> 134
>>> 135                 if (!route_table_valid) {
>>> 136                     route_table_reset();
>>> 137                 }
>>> (gdb) l
>>> 138             }
>>> 139             ovs_mutex_unlock(&route_table_mutex);
>>> 140         }
>>>
>>> I pulled up where route_table_valid is set:
>>>
>>> 298         static void
>>> 299         route_table_change(const struct route_table_msg *change
>> OVS_UNUSED,
>>> 300                            void *aux OVS_UNUSED)
>>> 301         {
>>> 302             route_table_valid = false;
>>> 303         }
>>>
>>>
>>> If I am reading the code correctly, every RTM_NEWROUTE netlink message
>>> that ovs-vswitchd is getting is setting the route_table_valid global 
>>> variable to
>> false and causing route_table_reset() to be run.
>>> This makes sense in context of what FRR is doing.  A full BGP feed
>>> *always* has churn.  So ovs-vswitchd is receiving. RTM_NEWROUTE
>>> message, parsing it and deciding in route_table_change() that the
>>> route table is no longer valid and causing it to call route_table_reset() 
>>> which
>> redumps the entire routing table to ovs-vswitchd.  In this case there are 
>> ~115k
>> ipv6 routes in the linux fib.
>>>
>>> I hesitate to make any changes here since I really don't understand what the
>> end goal here is.
>>> ovs-vswitchd is receiving a route change from the kernel but is in
>>> turn causing it to redump the entire routing table again.  What should be 
>>> the
>> correct behavior be from ovs-vswitchd's perspective here?
>>
>> Hi, Donald.
>>
>> Your analysis is correct.  OVS will invalidate the cached routing table and 
>> re-
>> dump it in full on the next access on each netlink notification about route
>> changes.
>>
>> Looking back into commit history, OVS did maintain the cache and only
>> added/removed what was in the netlink message incrementally.
>> But that changed in 2011 with the following commit:
>>
>> commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
>> Author: Ethan J. Jackson <[email protected]>
>> Date:   Thu Jan 13 16:29:31 2011 -0800
>>
>>    route-table: Handle route updates more robustly.
>>
>>    The kernel does not broadcast rtnetlink route messages in all cases
>>    one would expect.  This can cause stale entires to end up in the
>>    route table which may cause incorrect results for
>>    route_table_get_ifindex() queries.  This commit causes rtnetlink
>>    route messages to dump the entire route table on the next
>>    route_table_get_ifindex() query.
>>
>> And indeed, looking at the history of attempts of different projects to use
>> route notifications, they all are facing issues and it seems like none of 
>> them is
>> actually able to fully correctly handle all the notifications, just because 
>> these
>> notifications are notoriously bad.
>> It seems to be impossible in certain cases to tell what exactly changed and
>> how.  There could be duplicates or missing notifications.
>> And the code of projects that are trying to maintain a route cache in 
>> userspace
>> is insanely complex and doesn't handle 100% of cases anyway.
>>
>> There were attempts to convince kernel developers to add unique identifiers
>> to routes, so userspace can tell them apart, but all of them seems to die
>> leaving the problem unresolved.
>>
>> These are some discussions/bugs that I found:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>> la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&amp;data=05%7C01%7Celi
>> br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>> 40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>> XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LWOW4uNIhpSbEtBBVlhyy0
>> TiPyKXYxXv%2B%2Fwppp5bMpM%3D&amp;reserved=0
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil
>> la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&amp;data=05%7C01%7Celi
>> br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>> 40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>> XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vOfVjOADZpRIt1mEIj9ygrkD
>> UE2k4paCTiAB51Nj97w%3D&amp;reserved=0
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>> b.com%2Fthom311%2Flibnl%2Fissues%2F226&amp;data=05%7C01%7Celibr%4
>> 0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>> 7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>> 6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
>> W7d01OtMAkcAqWDnQwVE%3D&amp;reserved=0
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
>> b.com%2Fthom311%2Flibnl%2Fissues%2F224&amp;data=05%7C01%7Celibr%4
>> 0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>> 7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>> 6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
>> 0Jsy6eeN5UfUJ0%3D&amp;reserved=0
>>
>> None of the bugs seems to be resolved.  Most are closed for non-technical
>> reasons.
>>
>> I suppose, Ethan just decided to not deal with that horribly unreliable 
>> kernel
>> interface and just re-dump the route table on changes.
>>
>>
>> For your actual problem here, I'm not sure if we can fix it that easily.
>>
>> Is it necessary for OVS to know about these routes?
>> If no, it might be possible to isolate them in a separate network namespace,
>> so OVS will not receive all the route updates?
>>
>> Do you know how long it takes to dump a route table once?
>> Maybe it worth limiting that process to only dump once a second or once in a
>> few seconds.  That should alleviate the load if the actual dump is relatively
>> fast.
> In this setup OVS just runs without any use. There is no datapath (no 
> bridges/ports) configured. It is useless to run this mechanism at all for it.
> We can bind this mechanism to at least one datapath is configured (or even 
> only when there is at least one tunnel configured).
> What do you think?

Hmm.  Why don't you just stop/disable the service then?

>>
>> Best regards, Ilya Maximets.

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to