On 3/4/26 11:17 AM, Adrián Moreno wrote:
> On Tue, Mar 03, 2026 at 10:41:22PM +0100, Ilya Maximets wrote:
>> On 3/3/26 6:38 PM, Ilya Maximets wrote:
>>> On 2/24/26 11:06 AM, Adrián Moreno wrote:
>>>> On Thu, Feb 19, 2026 at 05:22:20PM +0100, Adrián Moreno wrote:
>>>>> On Thu, Feb 19, 2026 at 02:51:18PM +0100, Ilya Maximets wrote:
>>>>>> On 12/22/25 4:23 PM, Adrian Moreno via dev wrote:
>>>>>>> @@ -206,12 +206,20 @@ rtnetlink_parse_cb(struct ofpbuf *buf, void
>>>>>>> *change)
>>>>>>> *
>>>>>>> * xxx Joins more multicast groups when needed.
>>>>>>> *
>>>>>>> + * Callbacks might find that netdev-linux netdevs still hold outdated
>>>>>>> cached
>>>>>>> + * information. If the notification has to trigger some kind of
>>>>>>> reconfiguration
>>>>>>> + * that requires up-to-date netdev cache, it should do it
>>>>>>> asynchronously, for
>>>>>>> + * instance by setting a flag in the callback and acting on it during
>>>>>>> the
>>>>>>> + * normal "*_run()" operation.
>>>>>>> + *
>>>>>>> + * Notifications might come from any network namespace.
>>>>>>> + *
>>>>>>> * Returns an initialized nln_notifier if successful, NULL otherwise.
>>>>>>> */
>>>>>>> struct nln_notifier *
>>>>>>> rtnetlink_notifier_create(rtnetlink_notify_func *cb, void *aux)
>>>>>>> {
>>>>>>> if (!nln) {
>>>>>>> - nln = nln_create(NETLINK_ROUTE, false, rtnetlink_parse_cb,
>>>>>>> + nln = nln_create(NETLINK_ROUTE, true, rtnetlink_parse_cb,
>>>>>>> &rtn_change);
>>>>>>
>>>>>> Hi, Adrian. Thanks for all the work on the RTNL contention issues!
>>>>>>
>>>>>> One big thing I do not like about this set though is this change to start
>>>>>> monitoring all namespaces here and in netdev-linux. I don't think we
>>>>>> should
>>>>>> be doing that as this can have a potentially significant performance
>>>>>> impact.
>>>>>> For instance, we can have a BGP daemon running in a separate namespace
>>>>>> that
>>>>>> will create a ton of route updates and OVS will receive all of them now
>>>>>> that
>>>>>> it is subscribed. Even parsing all those updates and discarding as
>>>>>> irrelevant
>>>>>> eats a noticeable amount of CPU resources. And people may run multiple
>>>>>> BGP
>>>>>> daemons per node (which may sound unreasonable, but they do...) spamming
>>>>>> OVS
>>>>>> with all the updates. I'm afraid that it may even increase the
>>>>>> contention
>>>>>> on the locks inside the kernel in such cases as notifications from many
>>>>>> namespaces start to be forwarded into a single socket.
>>>>>>
>>>>>> All in all, I think, we need to find a more fine-grained solution here
>>>>>> instead of a blind subscription to all namespaces.
>>>>>
>>>>> Thanks for the feedback. It's truely something we should think about.
>>>>>
>>>>> IIUC, the current series only subscribes to all namespaces for
>>>>> RTNLGRP_LINK.
>>>>> Routes are being monitored by the specific nln notifier that
>>>>> route-table.c creates in route_table_init() which does not subscribe to
>>>>> route events in other namespaces.
>>
>> The part of the code above that I replied to here is turning on monitoring of
>> all namespaces for NETLINK_ROUTE, unless I'm missing something. We monitor
>> all namespaces for RTNLGRP_LINK today, so that's not a big problem.
>
> NETLINK_ROUTE is the netlink family, it's the way to interact with both
> routing and device subsystems. The next line after the code block you
> quoted (not present in the diff of this patch unfortunately) is this:
>
> return nln_notifier_create(nln, RTNLGRP_LINK, (nln_notify_func *) cb,
> aux);
>
> this makes the recently created socket join the mcast group that only
> listens to link changes, not route ones.
Oh, OK. I might have misread the changes. Maybe we don't need to
deprecate the internal port movement between namespaces just yet.
I think, I was mostly misled by the route change handlers having the
nsid check, which I guess should not be needed? I suppose it's
still good to have them just in case, but we may want to have some
comments in the code (it may also be good to mark such changes as
irrelevant instead of checking the nsid separately).
>
> With this series, the infrastructure could be visualized as this:
>
>
> ┌────────────────────────────────────────────────────────────────────────────────┐
> │
> route_table.c │
> │
> │
> │
> ┌───────────────────────────────────────────┐ │
>
> │┌──────────────────────────────┐ │ for route change detection
> │ │
> ││
> │ │ │ │
> ┌────┼┼ for link change
> detection │ │ family: NETLINK_ROUTE
> ┼─┼─────────┐
> │ ││
> │ │ mcast: RTNLGRP_IPV{4,6}_{ROUTE,RULE} │ │ │
> │
> │└──────────────────────────────┘ │ all_ns: false
> │ │ │
> ┌─────────────────────────────────┐ │ │
> └───────────────────────────────────────────┘ │ │
> │ bridge.c │ │
> └────────────────────────────────────────────────────────────────────────────────┘
> │
> │ │ │
> │
> │ to call bridge_reconfigure() │ │
> ┌───────────────────────────────────────────────────────────────────────────────────┐
> │
> │ if ifaces changed │ │ │
> netdev_linux │ │
> │ │ │ │
> ┌─────────────────────────────────────────┐ │ │
> └────────────────────┬────────────┘ │ │
> ┌───────────────────────────┐ │ for address change detection
> │ │ │
> │ │ │ │ to update
> netdev │ │ │ │ │
> ┌────────▼───────┐ │ ┌────┼───┼ internal
> (cached) state │ │ family: NETLINK_ROUTE │ │ │
> │ │ │ │ │
> └───────────────────────────┘ │ mcast: RTNLGRP_IPV4_IFADDR,
> │ │ │
> │ if-notifier │ │ │ │
> │ RTNLGRP_IPV6_{IFADDR,IFINFO} │ │ │
> │ │ │ │ │
> │ all_ns: true │ │ │
> └────────┬───────┘ │ │ │
> └─┬───────────────────────────────────────┘ │ │
> │ │ │
> └───────────────────────────────────────┼───────────────────────────────────────────┘
> │
> │ │ │
> │ │
> │ ┌────────▼────▼─────────────────┐
> │ │
> │ │ │
> │ │
> └─────────► rtnetlink_notifier.{c,h} │
> │ │
> │ │
> ┌──────────────────────┘ │
> │ family: NETLINK_ROUTE │ │
> │
> │ mcast: RTNLGRP_LINK │ │
> │
> │ all_ns: true │ │
> │
> │ │ │
> │
> └───────────────────────────────┘ │
> │
> │ uses │
> │
> │ │
> │
>
> ┌──────▼───────────────────▼────────────────────────────────────────────────────────────────────▼────┐
> │
>
> │
> │
> nln (netlink_notifier.{h,c})
> │
> │
>
> │
>
> └────────────────────────────────────────────────────────────────────────────────────────────────────┘
>
This diagram got completely scrambled in transit.
> In terms of netdev_linux's exposure to stuff happening in other
> namespaces, this series should not change anything. Previously
> netdev_linux was also creating a socket with
> NETLINK_LISTEN_ALL_NSID and registering to the same multicast
> groups (for link and address updates). Only it was doing it using its
> own socket.
>
> The only difference is that now it uses the same common infra (as shown
> above). This is needed because, the moment we cache the netdev state
> (e.g: link up or down), we need other subsystems such as route-table or
> if-notifier to see an updated version of the netdev when they query it.
>
> You could argue that from the perspective of route-table and
> if-notifier, they now get their callback called for events in all
> namespaces where as before they only got notified for NS_LOCAL events.
> That is true, but they quickly discard non-local events. Besides, those
> events have already been parsed and processed as netdev-linux needs them
> anyways. From a global perspective we go from:
>
> - netdev_linux: listening for all network namespaces
> - rtnetlink_notifier: listening only in local namespace
>
> to:
>
> - rtnetlink_notifier: listening for al network namespaces and
> netdev-linux being just another client
>
> Therefore this infrastructure shift, apart from being functionally
> necessary to allow caching the link state inside the netdev, removes the
> duplicated event processing of NS_LOCAL events that is currently taking
> place.
>
> Thanks.
> Adrián
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev