Thank you Ilya and Adrián for your comments and suggestions! Sorry for the late reply, but I wanted to dig deeper about this issue.
I think I found the root cause and it is actually not entirely fault of the NETLINK_LISTEN_ALL_NSID flag only, here. On Fri, 13 Mar 2026 at 09:33, Adrián Moreno <[email protected]> wrote: > On Thu, Mar 05, 2026 at 06:27:00PM +0100, Matteo Perin via dev wrote: > > For ports on non-system (i.e. userspace) datapaths, the > dpif_netlink_vport_get() > > call in netdev_linux_netnsid_update() is meaningless, these ports are not > > kernel vports. > > > > Generalize the tap class check in netdev_linux_netnsid_update() with a > > dpif_type check: when dpif_type is set and is not "system", assume the > > device is local without attempting the vport lookup. This change will > > cover all device types on userspace datapaths (e.g. veth pairs). > > > > Additionally, bypass the nsid equality check in netdev_linux_update() > > for non-system datapaths. When NETLINK_LISTEN_ALL_NSID is enabled, > > local RTM events carry the kernel-assigned namespace ID rather than > > NETNSID_LOCAL, causing a mismatch with the locally-assumed nsid. For > > non-system datapaths, process all RTM events unconditionally (the > > interface name lookup already ensures only OVS-managed devices are > > affected). > > IIUC, local events come without nsid in the socket's auxiliary data and > nl_sock_recv__ should ensure in that case NETNSID_LOCAL is returned. Can > you give more details of your usecase and how to see a local netdev > event with something different to NETNSID_LOCAL? Unfortunately, there is an instance where local events can come with a nsid and that is when there is a self-referential nsid mapping in the namespace peer ID table (i.e. the root namespace has an entry that maps to itself). This can be a common occurrence since container runtimes (this is true for LXD, for example, afaik) maintain nsid mappings so they can efficiently query network interface information across namespaces (e.g. retrieving container interface stats from the host without entering each container namespace). As a side-effect of these cross-namespace link queries, the kernel allocates an nsid entry in the host namespace table that maps back to itself. This mapping is harmless under normal operation, it is simply an artifact of how the kernel tracks namespace relationships and it persists for the lifetime of the system. When OVS enables NETLINK_LISTEN_ALL_NSID on its RTNL socket, the kernel decides whether to attach an nsid cmsg to each broadcast by looking up the sender's namespace in the receiver's nsid table. Normally the root namespace has no nsid entry for itself, so local events carry no cmsg, and OVS correctly records them as NETNSID_LOCAL (−1). But, given the precondition above, the kernel finds it when delivering any local RTM broadcast: the lookup returns the self nsid instead of "not assigned", so the kernel attaches a cmsg with numerical nsid to local events as well. On Fri, 13 Mar 2026 at 14:28, Ilya Maximets <[email protected]> wrote: > Not sure why the list was not included in my previous reply, adding it > back. > > On 3/13/26 9:49 AM, Adrián Moreno wrote: > > On Wed, Mar 11, 2026 at 01:54:13PM +0100, Ilya Maximets wrote: > >> On 3/5/26 6:27 PM, Matteo Perin via dev wrote: > >>> For ports on non-system (i.e. userspace) datapaths, the > dpif_netlink_vport_get() > >>> call in netdev_linux_netnsid_update() is meaningless, these ports are > not > >>> kernel vports. > >>> > >>> Generalize the tap class check in netdev_linux_netnsid_update() with a > >>> dpif_type check: when dpif_type is set and is not "system", assume the > >>> device is local without attempting the vport lookup. This change will > >>> cover all device types on userspace datapaths (e.g. veth pairs). > >>> > >>> Additionally, bypass the nsid equality check in netdev_linux_update() > >>> for non-system datapaths. When NETLINK_LISTEN_ALL_NSID is enabled, > >>> local RTM events carry the kernel-assigned namespace ID rather than > >>> NETNSID_LOCAL, causing a mismatch with the locally-assumed nsid. For > >>> non-system datapaths, process all RTM events unconditionally (the > >>> interface name lookup already ensures only OVS-managed devices are > >>> affected). > >> > >> Hmm. I don't think this is right. The name is not unique across > namespaces, > >> it can be a completely different device in a different namespace. We > can't > >> rely on just a name. > > > > I think you're right. This can be problematic. > > > >> > >> This all-nsids listening functionality is as annoying as it is > useless... :) > >> > > > > Should we go ahead with the attempts to deprecate it? > > Let's see what comes from your question that nsid should not be present in > the > local notifications. But if it is present, then I don't think there is an > actual way for us to know what's local and what isn't, unless we check the > actual > ID of the datapath interface and compare to that. But it sounds like more > and > more hacks for questionably useful functionality. So, in this case it > might be > better to just deprecate it. Maybe we could also try to fetch (with something like a query to RTM_GETNSID request with NETNSA_FD pointing to /proc/self/ns/net) and cache the static self-nsid and treat that as NETNSID_LOCAL too? I do not think that it will be a very clean workaround but it could be a possibility. Best Regards, Matteo _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
