On Wed, 24 Jun 2026 at 16:30, Ilya Maximets <[email protected]> wrote:
> On 6/9/26 2:42 PM, Matteo Perin via dev wrote:
> > OVS enables NETLINK_LISTEN_ALL_NSID on its RTNL notification socket so
> > that it can receive link and address events that originate in other
> > network namespaces. A side effect of this option is that the kernel
> > tags every broadcast notification with the sender's nsid as looked up in
> > the receiver's nsid table.
> >
> > Normally, events that originate in the local namespace carry no nsid
> > ancillary data, which OVS interprets as NETNSID_LOCAL. However, if a
> > self-referential nsid mapping exists for the local namespace, the kernel
> > tags local events with that nsid instead. Such a mapping can be created
> > by other processes on the system (for example as a side effect of
> > cross-namespace link queries by some container runtimes), at which point
> > OVS starts treating its own local events as coming from a remote
> > namespace and silently ignores them.
> >
> > To handle this deterministically, create a self-referential nsid mapping
> > for the local namespace at startup, before the notification socket is set
> > up and record the assigned id. The namespace is identified by our own
> > pid (NETNSA_PID). A self-referential nsid mapping is permanent once
> > created: the kernel only removes nsid entries when the peer namespace
> > is destroyed, which can never happen for our own namespace. So, the
> > value never changes and no runtime monitoring is required.
> >
> > The recorded self nsid is stored in a global variable accessed through
> > netnsid_is_local(), so that any RTM event tagged with either the absence
> > of an nsid or our own self nsid is treated as local and processed
> > normally.
> >
> > This kernel behavior was fixed in commit 88b126b39f97 ("net: netlink:
> > don't set nsid on local notifications"), which stops tagging local
> > events with the self-referential nsid. This workaround is only needed
> > for older kernels that lack that fix and can be removed once such kernels
> > are no longer supported.
> >
> > Signed-off-by: Matteo Perin <[email protected]>
>
> Thanks, Matteo! The patch looks good in general. Just a few comments
> below. The rest of the set looks good as-is.
>
> One thought I had though: Do we also maybe need to report nsid of the
> peer? ifindex is technically not enough to identify the interface in
> the system. May be a follow up change though.
Thank you very much for the review Ilya,
I agree that the veth ifindex is not enough to discriminate a single
one in the
system and we would indeed need to add the netnsid it is in to fully
identify it.
This would probably entail adding a rtnetlink-based lookup to discover the
nsid
of the peer namespace (relative to OVS own netns).
The reason I did not go through with this is that the original intent of
this patch
series was to add support for a quality-of-life improvement (i.e. removing
the
need of the manual dynamic-routing-port-name / dynamic-routing-port-mapping
options) when setting routing-protocol-redirect on an LRP in OVN.
In this case, the veth peer that OVN cares about is the other end of a pair
that
is in a namespace it already knows about.
So, we are not trying to resolve an arbitrary ifindex from anywhere in the
system
but we are finding the peer of a port OVN already owns.
In practice, the local-OVS-port to peer_ifindex mapping is sufficient for
OVN
to do if_indextoname() (or netlink RTM_GETLINK by ifindex) in the relevant
namespace.
Since the addition of a new peer_netnsid status key (similar to
peer_ifindex)
is most likely going to be just an additive change, I decided to keep the
patch
minimal since it already requires moving around quite a lot of stuff
already.
All things considered, I would prefer this addition to be a follow-up in
the future,
if you don't mind.
> + * instead, causing OVS to silently reject them. Some container runtimes
> > + * create such a mapping as a side-effect of cross-namespace link
> queries.
> > + *
> > + * To make this deterministic, we create the mapping ourselves (or read
> back
> > + * the existing one) at startup, before the notification socket exists
> and
> > + * record the resulting nsid as local. A self-referential nsid mapping
> is
> > + * permanent once created, the kernel only removes nsid entries when the
> > + * peer namespace is destroyed, which can never happen for our own
> namespace.
> > + * Therefore, the value never changes and no runtime monitoring is
> required.
> > + *
> > + * The namespace is identified by our own pid (NETNSA_PID).
> > + *
> > + * This kernel behavior was fixed in commit 88b126b39f97 ("net:
> netlink: don't
> > + * set nsid on local notifications"), which stops tagging local events
> with the
> > + * self-referential nsid. This workaround (and the self-nsid check in
> > + * netnsid_is_local()) is only needed for older kernels that lack that
> fix and
> > + * can be removed once such kernels are no longer supported. */
> > +static void
> > +netdev_linux_init_self_nsid(void)
> > +{
> > +#ifdef HAVE_LINUX_NET_NAMESPACE_H
> > + const int rta_offset = NLMSG_ALIGN(sizeof(struct rtgenmsg));
> > + struct ofpbuf request;
> > + struct ofpbuf *reply = NULL;
> > + uint32_t pid = getpid();
> > + int error;
> > +
> > + /* Create a self-referential nsid mapping for our own namespace.
> Passing
> > + * NETNSA_NSID == -1 lets the kernel allocate a free id. If the
> mapping
> > + * already exists the request fails with EEXIST, which is harmless.
> */
> > + ofpbuf_init(&request, 0);
> > + nl_msg_put_nlmsghdr(&request,
> > + rta_offset + 2 * NL_ATTR_SIZE(sizeof(uint32_t)),
> > + RTM_NEWNSID, NLM_F_REQUEST | NLM_F_ACK);
> > + ofpbuf_put_zeros(&request, rta_offset);
> > + nl_msg_put_u32(&request, NETNSA_PID, pid);
> > + nl_msg_put_u32(&request, NETNSA_NSID, NETNSID_LOCAL);
> > + nl_transact(NETLINK_ROUTE, &request, NULL);
> > + ofpbuf_uninit(&request);
> > +
> > + /* Read back the nsid that the kernel assigned to our namespace. */
> > + ofpbuf_init(&request, 0);
> > + nl_msg_put_nlmsghdr(&request, rta_offset +
> NL_ATTR_SIZE(sizeof(uint32_t)),
> > + RTM_GETNSID, NLM_F_REQUEST);
> > + ofpbuf_put_zeros(&request, rta_offset);
> > + nl_msg_put_u32(&request, NETNSA_PID, pid);
> > +
> > + error = nl_transact(NETLINK_ROUTE, &request, &reply);
> > + if (!error && reply) {
> > + const struct nlattr *a;
> > +
> > + a = nl_attr_find(reply, NLMSG_HDRLEN + rta_offset, NETNSA_NSID);
> > + if (a) {
> > + int nsid = nl_attr_get_u32(a);
> > +
> > + if (nsid >= 0) {
> > + netnsid_set_self(nsid);
> > + VLOG_DBG("local network namespace has nsid %d", nsid);
> > + }
> > + }
> > + }
>
> Should we maybe log a warning here on the 'else' branch?
Good idea. I do not expect this to fail, except for some very strange edge
cases but
adding more diagnostic info is always helpful.
I will include a vlog_warn if the local network namespace nsid cannot be
queried
successfully.
> ---
> > lib/automake.mk | 1 +
> > lib/netdev-linux.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
> > lib/netnsid.c | 17 +++++++++
> > lib/netnsid.h | 12 +++++-
> > 4 files changed, 121 insertions(+), 1 deletion(-)
> > create mode 100644 lib/netnsid.c
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 9b0f6fe0a..2dfca79da 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -168,6 +168,7 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/netflow.h \
> > lib/netlink.c \
> > lib/netlink.h \
> > + lib/netnsid.c \
> > lib/netnsid.h \
> > lib/nx-match.c \
> > lib/nx-match.h \
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index c694dc1c5..d7447fa06 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -586,6 +586,84 @@ is_tap_netdev(const struct netdev *netdev)
> > return netdev_get_class(netdev) == &netdev_tap_class;
> > }
> >
> > +/* Ensures that our own network namespace has a self-referential nsid
> mapping
> > + * and records its value through netnsid_set_self() so that
> netnsid_is_local()
> > + * treats it as the local namespace.
> > + *
> > + * NETLINK_LISTEN_ALL_NSID workaround: OVS enables this option on its
> RTNL
> > + * notification socket so that it can receive events from remote
> namespaces.
> > + * A side-effect of this option is that the kernel tags every broadcast
> > + * (including locally-originated RTM events) with the sender nsid as
> looked up
> > + * in the receiver nsid table. Normally local events carry no nsid cmsg
> > + * (which OVS interprets as NETNSID_LOCAL), but if a self-referential
> nsid
> > + * mapping exists for the local namespace the kernel tags them with
> that nsid
>
> I'd say 'an older kernel may tag them'.
>
> > +
> > + ofpbuf_uninit(&request);
> > + ofpbuf_delete(reply);
> > +#endif
> > +}
> > +
> > static int
> > netdev_linux_netnsid_update__(struct netdev_linux *netdev)
> > {
> > @@ -665,6 +743,11 @@ netdev_linux_notify_sock(void)
> > if (ovsthread_once_start(&once)) {
> > int error;
> >
> > + /* Discover (creating it if necessary) the nsid that the kernel
> uses
> > + * for our own namespace, before the notification socket starts
> > + * receiving namespace-tagged events. */
> > + netdev_linux_init_self_nsid();
> > +
> > error = nl_sock_create(NETLINK_ROUTE, &sock);
> > if (!error) {
> > size_t i;
> > @@ -936,6 +1019,15 @@ netdev_linux_update(struct netdev_linux *dev, int
> nsid,
> > const struct rtnetlink_change *change)
> > OVS_REQUIRES(dev->mutex)
> > {
> > + /* NETLINK_LISTEN_ALL_NSID workaround: the kernel may tag locally
> > + * originated events with our own self-referential nsid instead of
> > + * omitting the cmsg. netnsid_is_local() knows about that
> self-nsid (set
> > + * once at startup, see netdev_linux_init_self_nsid()), so
> normalize such
> > + * events back to the local namespace id before matching. */
> > + if (netnsid_is_local(nsid)) {
> > + netnsid_set_local(&nsid);
> > + }
>
> I think, we should not do that. The netnsid module is aware that local
> is not necessarily -1, so the equality check function in that module
> should take that into account, i.e.:
>
> if (netnsid_is_local(nsid1) && netnsid_is_local(nsid2)) {
> return true;
> }
>
> > +
> > if (netdev_linux_netnsid_is_eq(dev, nsid)) {
> > netdev_linux_update__(dev, change);
> > }
> > diff --git a/lib/netnsid.c b/lib/netnsid.c
> > new file mode 100644
> > index 000000000..043637e12
> > --- /dev/null
> > +++ b/lib/netnsid.c
> > @@ -0,0 +1,17 @@
>
> This file needs a copyright/license header.
>
> > +#include <config.h>
> > +#include "netnsid.h"
> > +
> > +/* The nsid that the local kernel assigns to our own network namespace
> when a
> > + * self-referential mapping exists, or NETNSID_LOCAL when none does.
> > + *
> > + * It is set once at startup by the platform-specific netdev code (see
> > + * netdev-linux.c) before any namespace-tagged netlink event can be
> received,
> > + * and is read afterwards by netnsid_is_local(). A self-referential
> nsid
> > + * mapping is permanent once created, so this value never changes
> again. */
>
> This comment duplicates one in the header. We may just drop it, I think.
I agree, sorry for missing these. They will all be all addressed in the
next iteration.
I will also add a reference about the peer_ifindex addition to the NEWS
file.
> +int netnsid_self = NETNSID_LOCAL;
> > +
> > +void
> > +netnsid_set_self(int nsid)
> > +{
> > + netnsid_self = nsid;
> > +}
> > diff --git a/lib/netnsid.h b/lib/netnsid.h
> > index 1d5ab83c5..0f558c79c 100644
> > --- a/lib/netnsid.h
> > +++ b/lib/netnsid.h
> > @@ -19,6 +19,8 @@
> >
> > #include <stdbool.h>
> >
> > +#include "util.h"
> > +
>
> Is this needed?
util.h is included for ovs_assert, which is called in the netnsid_set()
inline function further
down the header:
Before this #include, the header only worked because every existing includer
(e.g. netdev-linux.c) happened to include util.h first.
When the new netnsid.c was added the ovs_assert came up as an implicit
declaration
warning. Adding #include "util.h" here makes the header self-contained.
I will send the revised patchset shortly, unless you have any objections to
these comments.
Best Regards,
Matteo
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev