On 3/20/24 19:47, Ilya Maximets wrote: > Currently, ovs-vswitchd is subscribed to all the routing changes in the > kernel. On each change, it marks the internal routing table cache as > invalid, then resets it and dumps all the routes from the kernel from > scratch. The reason for that is kernel routing updates not being > reliable in a sense that it's hard to tell which route is getting > removed or modified. Userspace application has to track the order in > which route entries are dumped from the kernel. Updates can get lost > or even duplicated and the kernel doesn't provide a good mechanism to > distinguish one route from another. To my knowledge, dumping all the > routes from a kernel after each change is the only way to keep the > cache consistent. Some more info can be found in the following never > addressed issues: > https://bugzilla.redhat.com/1337860 > https://bugzilla.redhat.com/1337855 > > It seems to be believed that NetworkManager "mostly" does incremental > updates right. But it is still not completely correct, will re-dump > the whole table in certain cases, and it takes a huge amount of very > complicated code to do the accounting and route comparisons. > > Going back to ovs-vswitchd, it currently dumps routes from all the > routing tables. If it will get conflicting routes from multiple > tables, the cache will not be useful. The routing cache in userspace > is primarily used for checking the egress port for tunneled traffic > and this way also detecting link state changes for a tunnel port. > For userspace datapath it is used for actual routing of the packet > after sending to a native tunnel. > With kernel datapath we don't really have a mechanism to know which > routing table will actually be used by the kernel after encapsulation, > so our lookups on a cache may be incorrect because of this as well. > > So, unless all the relevant routes are in the standard tables, the > lookup in userspace route cache is unreliable. > > Luckily, most setups are not using any complicated routing in > non-standard tables that OVS has to be aware of. > > It is possible, but unlikely, that standard routing tables are > completely empty while some other custom table is not, and all the OVS > tunnel traffic is directed to that table. That would be the only > scenario where dumping non-standard tables would make sense. But it > seems like this kind of setup will likely need a way to tell OVS from > which table the routes should be taken, or we'll need to dump routing > rules and keep a separate cache for each table, so we can first match > on rules and then lookup correct routes in a specific table. I'm not > sure if trying to implement all that is justified. > > For now, stop considering routes from non-standard tables to avoid > mixing different tables together and also wasting CPU resources. > > This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is > running on a same host and in a same network namespace with OVS using > its own custom routing table. > > Unfortunately, there seems to be no way to tell the kernel to send > updates only for particular tables. So, we'll still receive and parse > all of them. But they will not result in a full cache invalidation in > most cases. > > Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. > So, we can make use of it and dump only standard tables when we get a > relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on > the socket for filtering to work. There is no reason to not enable it > by default, if supported. It is not used outside of NETLINK_ROUTE. > > Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.") > Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb") > Reported-at: https://github.com/openvswitch/ovs-issues/issues/185 > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > > Version 2: > * Changed log level for the NETLINK_GET_STRICT_CHK failure to > DBG if not supported and WARN if there is an actual failure. > * While at it added rate limiting. It is not necessary, but > doesn't hurt. > > Note: GitHub actions will likely fail due to ongoing major outage > of ppa.launchpad.net from where it installs 32bit toolchains. > Will need to request a re-check once the incident is over.
Should be resolved now. Recheck-request: github-robot _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev