Re: [ovs-discuss] ovs-vswitchd running at 100% cpu

Roberto Bartzen Acosta via discuss Tue, 01 Nov 2022 04:56:57 -0700

Hey folks,

Thanks for bringing up this discussion. I'm interested in using FRR+BGP for
the DVR scenario (ovn+ovs), and I understand that tracking the route events
is a very hard work, but I don't see the need for a node running ovs to
work with BGP "full routing", do you see any scenario with this BGP full
requirement?


Best regards,
Roberto

Em ter., 1 de nov. de 2022 às 07:39, Eli Britstein via discuss <
[email protected]> escreveu:

>
>
> >-----Original Message-----
> >From: Ilya Maximets <[email protected]>
> >Sent: Tuesday, 1 November 2022 12:23
> >To: Eli Britstein <[email protected]>; Donald Sharp
> ><[email protected]>; [email protected];
> >[email protected]
> >Cc: [email protected]
> >Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
> >
> >External email: Use caution opening links or attachments
> >
> >
> >On 11/1/22 10:50, Eli Britstein wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Ilya Maximets <[email protected]>
> >>> Sent: Monday, 31 October 2022 23:54
> >>> To: Donald Sharp <[email protected]>; ovs-
> >>> [email protected]; [email protected]; Eli Britstein
> >>> <[email protected]>
> >>> Cc: [email protected]
> >>> Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On 10/31/22 17:25, Donald Sharp via discuss wrote:
> >>>> Hi!
> >>>>
> >>>> I work on the FRRouting project (https://frrouting/org
> >>> <https://frrouting/org> ) and am doing work with FRR and have noticed
> >>> that when I have a full BGP feed on a system that is also running
> >>> ovs-vswitchd that ovs-vswitchd sits at 100% cpu:
> >>>>
> >>>> top - 09:43:12 up 4 days, 22:53,  3 users,  load average: 1.06, 1.08,
> 1.08
> >>>> Tasks: 188 total,   3 running, 185 sleeping,   0 stopped,   0 zombie
> >>>> %Cpu(s): 12.3 us, 14.7 sy,  0.0 ni, 72.8 id,  0.0 wa,  0.0 hi,  0.2
> si,  0.0 st
> >>>> MiB Mem :   7859.3 total,   2756.5 free,   2467.2 used,   2635.6
> buff/cache
> >>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   5101.9
> avail Mem
> >>>>
> >>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> >COMMAND
> >>>>     730 root      10 -10  146204 146048  11636 R  98.3   1.8
>  6998:13 ovs-
> >vswitchd
> >>>>  169620 root      20   0       0      0      0 I   3.3   0.0
>  1:34.83 kworker/0:3-events
> >>>>      21 root      20   0       0      0      0 S   1.3   0.0
> 14:09.59 ksoftirqd/1
> >>>>  131734 frr       15  -5 2384292 609556   6612 S   1.0   7.6
> 21:57.51 zebra
> >>>>  131739 frr       15  -5 1301168   1.0g   7420 S   1.0  13.3
> 18:16.17 bgpd
> >>>>
> >>>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops
> >>>> running
> >>> at 100%:
> >>>>
> >>>> top - 09:48:12 up 4 days, 22:58,  3 users,  load average: 0.08, 0.60,
> 0.89
> >>>> Tasks: 169 total,   1 running, 168 sleeping,   0 stopped,   0 zombie
> >>>> %Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1
> si,  0.0 st
> >>>> MiB Mem :   7859.3 total,   4560.6 free,    663.1 used,   2635.6
> buff/cache
> >>>> MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.   6906.1
> avail Mem
> >>>>
> >>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> >COMMAND
> >>>>  179064 sharpd    20   0   11852   3816   3172 R   1.0   0.0
>  0:00.09 top
> >>>>    1037 zerotie+  20   0  291852 113180   7408 S   0.7   1.4
> 19:09.17 zerotier-
> >one
> >>>>    1043 Debian-+  20   0   34356  21988   7588 S   0.3   0.3
> 22:04.42 snmpd
> >>>>  178480 root      20   0       0      0      0 I   0.3   0.0
>  0:01.21 kworker/1:2-events
> >>>>  178622 sharpd    20   0   14020   6364   4872 S   0.3   0.1
>  0:00.10 sshd
> >>>>       1 root      20   0  169872  13140   8272 S   0.0   0.2
>  2:33.26 systemd
> >>>>       2 root      20   0       0      0      0 S   0.0   0.0
>  0:00.60 kthreadd
> >>>>
> >>>> I do not have any particular ovs configuration on this box:
> >>>> sharpd@janelle:~$ sudo ovs-vsctl show
> >>>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
> >>>>     ovs_version: "2.13.8"
> >>>>
> >>>>
> >>>> sharpd@janelle:~$ sudo ovs-vsctl list o .
> >>>> _uuid               : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
> >>>> bridges             : []
> >>>> cur_cfg             : 0
> >>>> datapath_types      : [netdev, system]
> >>>> datapaths           : {}
> >>>> db_version          : "8.2.0"
> >>>> dpdk_initialized    : false
> >>>> dpdk_version        : none
> >>>> external_ids        : {hostname=janelle,
> rundir="/var/run/openvswitch",
> >>> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
> >>>> iface_types         : [erspan, geneve, gre, internal, ip6erspan,
> ip6gre, lisp,
> >>> patch, stt, system, tap, vxlan]
> >>>> manager_options     : []
> >>>> next_cfg            : 0
> >>>> other_config        : {}
> >>>> ovs_version         : "2.13.8"
> >>>> ssl                 : []
> >>>> statistics          : {}
> >>>> system_type         : ubuntu
> >>>> system_version      : "20.04"
> >>>>
> >>>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
> >>>> ovs-vswitchd: no datapaths exist
> >>>> ovs-vswitchd: datapath not found (Invalid argument)
> >>>> ovs-appctl: ovs-vswitchd: server returned an error
> >>>>
> >>>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
> >>>> and saw the same behavior.  When I pulled up the running code in a
> >>> debugger I see that ovs-vswitchd is running in this loop below pretty
> >>> much 100% of the time:
> >>>>
> >>>> (gdb) f 4
> >>>> #4  0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
> >>>> 133                 nln_run(nln);
> >>>> (gdb) l
> >>>> 128             OVS_EXCLUDED(route_table_mutex)
> >>>> 129         {
> >>>> 130             ovs_mutex_lock(&route_table_mutex);
> >>>> 131             if (nln) {
> >>>> 132                 rtnetlink_run();
> >>>> 133                 nln_run(nln);
> >>>> 134
> >>>> 135                 if (!route_table_valid) {
> >>>> 136                     route_table_reset();
> >>>> 137                 }
> >>>> (gdb) l
> >>>> 138             }
> >>>> 139             ovs_mutex_unlock(&route_table_mutex);
> >>>> 140         }
> >>>>
> >>>> I pulled up where route_table_valid is set:
> >>>>
> >>>> 298         static void
> >>>> 299         route_table_change(const struct route_table_msg *change
> >>> OVS_UNUSED,
> >>>> 300                            void *aux OVS_UNUSED)
> >>>> 301         {
> >>>> 302             route_table_valid = false;
> >>>> 303         }
> >>>>
> >>>>
> >>>> If I am reading the code correctly, every RTM_NEWROUTE netlink
> >>>> message that ovs-vswitchd is getting is setting the
> >>>> route_table_valid global variable to
> >>> false and causing route_table_reset() to be run.
> >>>> This makes sense in context of what FRR is doing.  A full BGP feed
> >>>> *always* has churn.  So ovs-vswitchd is receiving. RTM_NEWROUTE
> >>>> message, parsing it and deciding in route_table_change() that the
> >>>> route table is no longer valid and causing it to call
> >>>> route_table_reset() which
> >>> redumps the entire routing table to ovs-vswitchd.  In this case there
> >>> are ~115k
> >>> ipv6 routes in the linux fib.
> >>>>
> >>>> I hesitate to make any changes here since I really don't understand
> >>>> what the
> >>> end goal here is.
> >>>> ovs-vswitchd is receiving a route change from the kernel but is in
> >>>> turn causing it to redump the entire routing table again.  What
> >>>> should be the
> >>> correct behavior be from ovs-vswitchd's perspective here?
> >>>
> >>> Hi, Donald.
> >>>
> >>> Your analysis is correct.  OVS will invalidate the cached routing
> >>> table and re- dump it in full on the next access on each netlink
> >>> notification about route changes.
> >>>
> >>> Looking back into commit history, OVS did maintain the cache and only
> >>> added/removed what was in the netlink message incrementally.
> >>> But that changed in 2011 with the following commit:
> >>>
> >>> commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
> >>> Author: Ethan J. Jackson <[email protected]>
> >>> Date:   Thu Jan 13 16:29:31 2011 -0800
> >>>
> >>>    route-table: Handle route updates more robustly.
> >>>
> >>>    The kernel does not broadcast rtnetlink route messages in all cases
> >>>    one would expect.  This can cause stale entires to end up in the
> >>>    route table which may cause incorrect results for
> >>>    route_table_get_ifindex() queries.  This commit causes rtnetlink
> >>>    route messages to dump the entire route table on the next
> >>>    route_table_get_ifindex() query.
> >>>
> >>> And indeed, looking at the history of attempts of different projects
> >>> to use route notifications, they all are facing issues and it seems
> >>> like none of them is actually able to fully correctly handle all the
> >>> notifications, just because these notifications are notoriously bad.
> >>> It seems to be impossible in certain cases to tell what exactly
> >>> changed and how.  There could be duplicates or missing notifications.
> >>> And the code of projects that are trying to maintain a route cache in
> >>> userspace is insanely complex and doesn't handle 100% of cases anyway.
> >>>
> >>> There were attempts to convince kernel developers to add unique
> >>> identifiers to routes, so userspace can tell them apart, but all of
> >>> them seems to die leaving the problem unresolved.
> >>>
> >>> These are some discussions/bugs that I found:
> >>>
> >>>
> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
> >>> zil
> >>>
> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&amp;data=05%7C01%7Celi
> >>>
> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
> >>>
> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
> >>>
> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> >>>
> >XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=LWOW4uNIhpSbEtBBVlhyy0
> >>> TiPyKXYxXv%2B%2Fwppp5bMpM%3D&amp;reserved=0
> >>>
> >>>
> >https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
> >>> zil
> >>>
> >la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&amp;data=05%7C01%7Celi
> >>>
> >br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
> >>>
> >40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
> >>>
> >WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
> >>>
> >XVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vOfVjOADZpRIt1mEIj9ygrkD
> >>> UE2k4paCTiAB51Nj97w%3D&amp;reserved=0
> >>>
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> >>> hu
> >>>
> >b.com%2Fthom311%2Flibnl%2Fissues%2F226&amp;data=05%7C01%7Celibr%4
> >>>
> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
> >>>
> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
> >>>
> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> >>>
> >6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
> >>> W7d01OtMAkcAqWDnQwVE%3D&amp;reserved=0
> >>>
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> >>> hu
> >>>
> >b.com%2Fthom311%2Flibnl%2Fissues%2F224&amp;data=05%7C01%7Celibr%4
> >>>
> >0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
> >>>
> >7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
> >>>
> >bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> >>>
> >6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
> >>> 0Jsy6eeN5UfUJ0%3D&amp;reserved=0
> >>>
> >>> None of the bugs seems to be resolved.  Most are closed for
> >>> non-technical reasons.
> >>>
> >>> I suppose, Ethan just decided to not deal with that horribly
> >>> unreliable kernel interface and just re-dump the route table on
> changes.
> >>>
> >>>
> >>> For your actual problem here, I'm not sure if we can fix it that
> easily.
> >>>
> >>> Is it necessary for OVS to know about these routes?
> >>> If no, it might be possible to isolate them in a separate network
> >>> namespace, so OVS will not receive all the route updates?
> >>>
> >>> Do you know how long it takes to dump a route table once?
> >>> Maybe it worth limiting that process to only dump once a second or
> >>> once in a few seconds.  That should alleviate the load if the actual
> >>> dump is relatively fast.
> >> In this setup OVS just runs without any use. There is no datapath (no
> >bridges/ports) configured. It is useless to run this mechanism at all for
> it.
> >> We can bind this mechanism to at least one datapath is configured (or
> even
> >only when there is at least one tunnel configured).
> >> What do you think?
> >
> >Hmm.  Why don't you just stop/disable the service then?
> Indeed, that's possible. It's just turned on by default in this system
> (Debian) and Donald noticed the CPU consumption.
> >
> >>>
> >>> Best regards, Ilya Maximets.
>
> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] ovs-vswitchd running at 100% cpu

Reply via email to