On Tue, Mar 03, 2026 at 10:41:22PM +0100, Ilya Maximets wrote:
> On 3/3/26 6:38 PM, Ilya Maximets wrote:
> > On 2/24/26 11:06 AM, Adrián Moreno wrote:
> >> On Thu, Feb 19, 2026 at 05:22:20PM +0100, Adrián Moreno wrote:
> >>> On Thu, Feb 19, 2026 at 02:51:18PM +0100, Ilya Maximets wrote:
> >>>> On 12/22/25 4:23 PM, Adrian Moreno via dev wrote:
> >>>>> @@ -206,12 +206,20 @@ rtnetlink_parse_cb(struct ofpbuf *buf, void
> >>>>> *change)
> >>>>> *
> >>>>> * xxx Joins more multicast groups when needed.
> >>>>> *
> >>>>> + * Callbacks might find that netdev-linux netdevs still hold outdated
> >>>>> cached
> >>>>> + * information. If the notification has to trigger some kind of
> >>>>> reconfiguration
> >>>>> + * that requires up-to-date netdev cache, it should do it
> >>>>> asynchronously, for
> >>>>> + * instance by setting a flag in the callback and acting on it during
> >>>>> the
> >>>>> + * normal "*_run()" operation.
> >>>>> + *
> >>>>> + * Notifications might come from any network namespace.
> >>>>> + *
> >>>>> * Returns an initialized nln_notifier if successful, NULL otherwise.
> >>>>> */
> >>>>> struct nln_notifier *
> >>>>> rtnetlink_notifier_create(rtnetlink_notify_func *cb, void *aux)
> >>>>> {
> >>>>> if (!nln) {
> >>>>> - nln = nln_create(NETLINK_ROUTE, false, rtnetlink_parse_cb,
> >>>>> + nln = nln_create(NETLINK_ROUTE, true, rtnetlink_parse_cb,
> >>>>> &rtn_change);
> >>>>
> >>>> Hi, Adrian. Thanks for all the work on the RTNL contention issues!
> >>>>
> >>>> One big thing I do not like about this set though is this change to start
> >>>> monitoring all namespaces here and in netdev-linux. I don't think we
> >>>> should
> >>>> be doing that as this can have a potentially significant performance
> >>>> impact.
> >>>> For instance, we can have a BGP daemon running in a separate namespace
> >>>> that
> >>>> will create a ton of route updates and OVS will receive all of them now
> >>>> that
> >>>> it is subscribed. Even parsing all those updates and discarding as
> >>>> irrelevant
> >>>> eats a noticeable amount of CPU resources. And people may run multiple
> >>>> BGP
> >>>> daemons per node (which may sound unreasonable, but they do...) spamming
> >>>> OVS
> >>>> with all the updates. I'm afraid that it may even increase the
> >>>> contention
> >>>> on the locks inside the kernel in such cases as notifications from many
> >>>> namespaces start to be forwarded into a single socket.
> >>>>
> >>>> All in all, I think, we need to find a more fine-grained solution here
> >>>> instead of a blind subscription to all namespaces.
> >>>
> >>> Thanks for the feedback. It's truely something we should think about.
> >>>
> >>> IIUC, the current series only subscribes to all namespaces for
> >>> RTNLGRP_LINK.
> >>> Routes are being monitored by the specific nln notifier that
> >>> route-table.c creates in route_table_init() which does not subscribe to
> >>> route events in other namespaces.
>
> The part of the code above that I replied to here is turning on monitoring of
> all namespaces for NETLINK_ROUTE, unless I'm missing something. We monitor
> all namespaces for RTNLGRP_LINK today, so that's not a big problem.
NETLINK_ROUTE is the netlink family, it's the way to interact with both
routing and device subsystems. The next line after the code block you
quoted (not present in the diff of this patch unfortunately) is this:
return nln_notifier_create(nln, RTNLGRP_LINK, (nln_notify_func *) cb, aux);
this makes the recently created socket join the mcast group that only
listens to link changes, not route ones.
With this series, the infrastructure could be visualized as this:
┌────────────────────────────────────────────────────────────────────────────────┐
│
route_table.c │
│
│
│
┌───────────────────────────────────────────┐ │
│┌──────────────────────────────┐ │ for route change detection
│ │
││
│ │ │ │
┌────┼┼ for link change
detection │ │ family: NETLINK_ROUTE
┼─┼─────────┐
│ ││
│ │ mcast: RTNLGRP_IPV{4,6}_{ROUTE,RULE} │ │ │
│
│└──────────────────────────────┘ │ all_ns: false
│ │ │
┌─────────────────────────────────┐ │ │
└───────────────────────────────────────────┘ │ │
│ bridge.c │ │
└────────────────────────────────────────────────────────────────────────────────┘
│
│ │ │
│
│ to call bridge_reconfigure() │ │
┌───────────────────────────────────────────────────────────────────────────────────┐
│
│ if ifaces changed │ │ │
netdev_linux │ │
│ │ │ │
┌─────────────────────────────────────────┐ │ │
└────────────────────┬────────────┘ │ │
┌───────────────────────────┐ │ for address change detection
│ │ │
│ │ │ │ to update
netdev │ │ │ │ │
┌────────▼───────┐ │ ┌────┼───┼ internal
(cached) state │ │ family: NETLINK_ROUTE │ │ │
│ │ │ │ │
└───────────────────────────┘ │ mcast: RTNLGRP_IPV4_IFADDR,
│ │ │
│ if-notifier │ │ │ │
│ RTNLGRP_IPV6_{IFADDR,IFINFO} │ │ │
│ │ │ │ │
│ all_ns: true │ │ │
└────────┬───────┘ │ │ │
└─┬───────────────────────────────────────┘ │ │
│ │ │
└───────────────────────────────────────┼───────────────────────────────────────────┘
│
│ │ │
│ │
│ ┌────────▼────▼─────────────────┐
│ │
│ │ │
│ │
└─────────► rtnetlink_notifier.{c,h} │
│ │
│ │
┌──────────────────────┘ │
│ family: NETLINK_ROUTE │ │
│
│ mcast: RTNLGRP_LINK │ │
│
│ all_ns: true │ │
│
│ │ │
│
└───────────────────────────────┘ │
│
│ uses │
│
│ │
│
┌──────▼───────────────────▼────────────────────────────────────────────────────────────────────▼────┐
│
│
│
nln (netlink_notifier.{h,c})
│
│
│
└────────────────────────────────────────────────────────────────────────────────────────────────────┘
In terms of netdev_linux's exposure to stuff happening in other
namespaces, this series should not change anything. Previously
netdev_linux was also creating a socket with
NETLINK_LISTEN_ALL_NSID and registering to the same multicast
groups (for link and address updates). Only it was doing it using its
own socket.
The only difference is that now it uses the same common infra (as shown
above). This is needed because, the moment we cache the netdev state
(e.g: link up or down), we need other subsystems such as route-table or
if-notifier to see an updated version of the netdev when they query it.
You could argue that from the perspective of route-table and
if-notifier, they now get their callback called for events in all
namespaces where as before they only got notified for NS_LOCAL events.
That is true, but they quickly discard non-local events. Besides, those
events have already been parsed and processed as netdev-linux needs them
anyways. From a global perspective we go from:
- netdev_linux: listening for all network namespaces
- rtnetlink_notifier: listening only in local namespace
to:
- rtnetlink_notifier: listening for al network namespaces and
netdev-linux being just another client
Therefore this infrastructure shift, apart from being functionally
necessary to allow caching the link state inside the netdev, removes the
duplicated event processing of NS_LOCAL events that is currently taking
place.
Thanks.
Adrián
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev