On Wed, Mar 04, 2026 at 05:06:02PM +0100, Ilya Maximets wrote:
> On 3/4/26 11:17 AM, Adrián Moreno wrote:
> > On Tue, Mar 03, 2026 at 10:41:22PM +0100, Ilya Maximets wrote:
> >> On 3/3/26 6:38 PM, Ilya Maximets wrote:
> >>> On 2/24/26 11:06 AM, Adrián Moreno wrote:
> >>>> On Thu, Feb 19, 2026 at 05:22:20PM +0100, Adrián Moreno wrote:
> >>>>> On Thu, Feb 19, 2026 at 02:51:18PM +0100, Ilya Maximets wrote:
> >>>>>> On 12/22/25 4:23 PM, Adrian Moreno via dev wrote:
> >>>>>>> @@ -206,12 +206,20 @@ rtnetlink_parse_cb(struct ofpbuf *buf, void
> >>>>>>> *change)
> >>>>>>> *
> >>>>>>> * xxx Joins more multicast groups when needed.
> >>>>>>> *
> >>>>>>> + * Callbacks might find that netdev-linux netdevs still hold
> >>>>>>> outdated cached
> >>>>>>> + * information. If the notification has to trigger some kind of
> >>>>>>> reconfiguration
> >>>>>>> + * that requires up-to-date netdev cache, it should do it
> >>>>>>> asynchronously, for
> >>>>>>> + * instance by setting a flag in the callback and acting on it
> >>>>>>> during the
> >>>>>>> + * normal "*_run()" operation.
> >>>>>>> + *
> >>>>>>> + * Notifications might come from any network namespace.
> >>>>>>> + *
> >>>>>>> * Returns an initialized nln_notifier if successful, NULL
> >>>>>>> otherwise. */
> >>>>>>> struct nln_notifier *
> >>>>>>> rtnetlink_notifier_create(rtnetlink_notify_func *cb, void *aux)
> >>>>>>> {
> >>>>>>> if (!nln) {
> >>>>>>> - nln = nln_create(NETLINK_ROUTE, false, rtnetlink_parse_cb,
> >>>>>>> + nln = nln_create(NETLINK_ROUTE, true, rtnetlink_parse_cb,
> >>>>>>> &rtn_change);
> >>>>>>
> >>>>>> Hi, Adrian. Thanks for all the work on the RTNL contention issues!
> >>>>>>
> >>>>>> One big thing I do not like about this set though is this change to
> >>>>>> start
> >>>>>> monitoring all namespaces here and in netdev-linux. I don't think we
> >>>>>> should
> >>>>>> be doing that as this can have a potentially significant performance
> >>>>>> impact.
> >>>>>> For instance, we can have a BGP daemon running in a separate namespace
> >>>>>> that
> >>>>>> will create a ton of route updates and OVS will receive all of them
> >>>>>> now that
> >>>>>> it is subscribed. Even parsing all those updates and discarding as
> >>>>>> irrelevant
> >>>>>> eats a noticeable amount of CPU resources. And people may run
> >>>>>> multiple BGP
> >>>>>> daemons per node (which may sound unreasonable, but they do...)
> >>>>>> spamming OVS
> >>>>>> with all the updates. I'm afraid that it may even increase the
> >>>>>> contention
> >>>>>> on the locks inside the kernel in such cases as notifications from many
> >>>>>> namespaces start to be forwarded into a single socket.
> >>>>>>
> >>>>>> All in all, I think, we need to find a more fine-grained solution here
> >>>>>> instead of a blind subscription to all namespaces.
> >>>>>
> >>>>> Thanks for the feedback. It's truely something we should think about.
> >>>>>
> >>>>> IIUC, the current series only subscribes to all namespaces for
> >>>>> RTNLGRP_LINK.
> >>>>> Routes are being monitored by the specific nln notifier that
> >>>>> route-table.c creates in route_table_init() which does not subscribe to
> >>>>> route events in other namespaces.
> >>
> >> The part of the code above that I replied to here is turning on monitoring
> >> of
> >> all namespaces for NETLINK_ROUTE, unless I'm missing something. We monitor
> >> all namespaces for RTNLGRP_LINK today, so that's not a big problem.
> >
> > NETLINK_ROUTE is the netlink family, it's the way to interact with both
> > routing and device subsystems. The next line after the code block you
> > quoted (not present in the diff of this patch unfortunately) is this:
> >
> > return nln_notifier_create(nln, RTNLGRP_LINK, (nln_notify_func *) cb,
> > aux);
> >
> > this makes the recently created socket join the mcast group that only
> > listens to link changes, not route ones.
>
> Oh, OK. I might have misread the changes. Maybe we don't need to
> deprecate the internal port movement between namespaces just yet.
> I think, I was mostly misled by the route change handlers having the
> nsid check, which I guess should not be needed? I suppose it's
> still good to have them just in case, but we may want to have some
> comments in the code (it may also be good to mark such changes as
> irrelevant instead of checking the nsid separately).
>
Good point.
We only need them for the interface monitor which relies on the common
rtnetlink.
For route change notifiers you are right, we don't need the check and
the reception of a non-local event is actualy a pretty ugly kernel bug.
Marking it as irrelevant would be a good step, but I think we can make
it clearer by removing the nsid from the notification API altogether. If
we pass the nsid to the parsing function each parser can decide whether
they mark non-local events as irrelevant or they want to store the nsid
in the change object and pass it to the notification callbacks.
I'll resping the series with this modification.
> >
> > With this series, the infrastructure could be visualized as this:
> >
> >
> > ┌────────────────────────────────────────────────────────────────────────────────┐
> > │
> > route_table.c │
> > │
> > │
> > │
> > ┌───────────────────────────────────────────┐ │
> >
> > │┌──────────────────────────────┐ │ for route change detection
> > │ │
> > ││
> > │ │ │ │
> > ┌────┼┼ for link change
> > detection │ │ family: NETLINK_ROUTE
> > ┼─┼─────────┐
> > │ ││
> > │ │ mcast: RTNLGRP_IPV{4,6}_{ROUTE,RULE} │ │ │
> > │
> > │└──────────────────────────────┘ │ all_ns: false
> > │ │ │
> > ┌─────────────────────────────────┐ │ │
> > └───────────────────────────────────────────┘ │ │
> > │ bridge.c │ │
> > └────────────────────────────────────────────────────────────────────────────────┘
> > │
> > │ │ │
> > │
> > │ to call bridge_reconfigure() │ │
> > ┌───────────────────────────────────────────────────────────────────────────────────┐
> > │
> > │ if ifaces changed │ │ │
> > netdev_linux │ │
> > │ │ │ │
> > ┌─────────────────────────────────────────┐ │ │
> > └────────────────────┬────────────┘ │ │
> > ┌───────────────────────────┐ │ for address change detection
> > │ │ │
> > │ │ │ │ to update
> > netdev │ │ │ │ │
> > ┌────────▼───────┐ │ ┌────┼───┼ internal
> > (cached) state │ │ family: NETLINK_ROUTE │ │ │
> > │ │ │ │ │
> > └───────────────────────────┘ │ mcast: RTNLGRP_IPV4_IFADDR,
> > │ │ │
> > │ if-notifier │ │ │ │
> > │ RTNLGRP_IPV6_{IFADDR,IFINFO} │ │ │
> > │ │ │ │ │
> > │ all_ns: true │ │ │
> > └────────┬───────┘ │ │ │
> > └─┬───────────────────────────────────────┘ │ │
> > │ │ │
> > └───────────────────────────────────────┼───────────────────────────────────────────┘
> > │
> > │ │ │
> > │ │
> > │ ┌────────▼────▼─────────────────┐
> > │ │
> > │ │ │
> > │ │
> > └─────────► rtnetlink_notifier.{c,h} │
> > │ │
> > │ │
> > ┌──────────────────────┘ │
> > │ family: NETLINK_ROUTE │ │
> > │
> > │ mcast: RTNLGRP_LINK │ │
> > │
> > │ all_ns: true │ │
> > │
> > │ │ │
> > │
> > └───────────────────────────────┘ │
> > │
> > │ uses │
> > │
> > │ │
> > │
> >
> > ┌──────▼───────────────────▼────────────────────────────────────────────────────────────────────▼────┐
> > │
> >
> > │
> > │
> > nln (netlink_notifier.{h,c})
> > │
> > │
> >
> > │
> >
> > └────────────────────────────────────────────────────────────────────────────────────────────────────┘
> >
>
> This diagram got completely scrambled in transit.
>
Ugh, sorry. I'll try to add a simplified version in the cover letter in
the next version.
Thanks.
Adrián
> > In terms of netdev_linux's exposure to stuff happening in other
> > namespaces, this series should not change anything. Previously
> > netdev_linux was also creating a socket with
> > NETLINK_LISTEN_ALL_NSID and registering to the same multicast
> > groups (for link and address updates). Only it was doing it using its
> > own socket.
> >
> > The only difference is that now it uses the same common infra (as shown
> > above). This is needed because, the moment we cache the netdev state
> > (e.g: link up or down), we need other subsystems such as route-table or
> > if-notifier to see an updated version of the netdev when they query it.
> >
> > You could argue that from the perspective of route-table and
> > if-notifier, they now get their callback called for events in all
> > namespaces where as before they only got notified for NS_LOCAL events.
> > That is true, but they quickly discard non-local events. Besides, those
> > events have already been parsed and processed as netdev-linux needs them
> > anyways. From a global perspective we go from:
> >
> > - netdev_linux: listening for all network namespaces
> > - rtnetlink_notifier: listening only in local namespace
> >
> > to:
> >
> > - rtnetlink_notifier: listening for al network namespaces and
> > netdev-linux being just another client
> >
> > Therefore this infrastructure shift, apart from being functionally
> > necessary to allow caching the link state inside the netdev, removes the
> > duplicated event processing of NS_LOCAL events that is currently taking
> > place.
> >
> > Thanks.
> > Adrián
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev