On 6/25/26 5:18 PM, Matteo Perin wrote:
> On Wed, 24 Jun 2026 at 16:30, Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 6/9/26 2:42 PM, Matteo Perin via dev wrote:
>     > OVS enables NETLINK_LISTEN_ALL_NSID on its RTNL notification socket so
>     > that it can receive link and address events that originate in other
>     > network namespaces.  A side effect of this option is that the kernel
>     > tags every broadcast notification with the sender's nsid as looked up in
>     > the receiver's nsid table.
>     >
>     > Normally, events that originate in the local namespace carry no nsid
>     > ancillary data, which OVS interprets as NETNSID_LOCAL.  However, if a
>     > self-referential nsid mapping exists for the local namespace, the kernel
>     > tags local events with that nsid instead.  Such a mapping can be created
>     > by other processes on the system (for example as a side effect of
>     > cross-namespace link queries by some container runtimes), at which point
>     > OVS starts treating its own local events as coming from a remote
>     > namespace and silently ignores them.
>     >
>     > To handle this deterministically, create a self-referential nsid mapping
>     > for the local namespace at startup, before the notification socket is 
> set
>     > up and record the assigned id.  The namespace is identified by our own
>     > pid (NETNSA_PID).  A self-referential nsid mapping is permanent once
>     > created: the kernel only removes nsid entries when the peer namespace
>     > is destroyed, which can never happen for our own namespace.  So, the
>     > value never changes and no runtime monitoring is required.
>     >
>     > The recorded self nsid is stored in a global variable accessed through
>     > netnsid_is_local(), so that any RTM event tagged with either the absence
>     > of an nsid or our own self nsid is treated as local and processed
>     > normally.
>     >
>     > This kernel behavior was fixed in commit 88b126b39f97 ("net: netlink:
>     > don't set nsid on local notifications"), which stops tagging local
>     > events with the self-referential nsid.  This workaround is only needed
>     > for older kernels that lack that fix and can be removed once such 
> kernels
>     > are no longer supported.
>     >
>     > Signed-off-by: Matteo Perin <[email protected] 
> <mailto:[email protected]>>
> 
>     Thanks, Matteo!  The patch looks good in general.  Just a few comments
>     below.  The rest of the set looks good as-is.
> 
>     One thought I had though: Do we also maybe need to report nsid of the
>     peer?  ifindex is technically not enough to identify the interface in
>     the system.  May be a follow up change though.
> 
> 
> Thank you very much for the review Ilya,
> 
> I agree that the veth ifindex is not enough to discriminate a single one in 
> the
> system and we would indeed need to add the netnsid it is in to fully identify 
> it.
> This would probably entail adding a rtnetlink-based lookup to discover the 
> nsid
> of the peer namespace (relative to OVS own netns).
> 
> The reason I did not go through with this is that the original intent of this 
> patch
> series was to add support for a quality-of-life improvement (i.e. removing the
> need of the manual dynamic-routing-port-name / dynamic-routing-port-mapping
> options) when setting routing-protocol-redirect on an LRP in OVN.
> In this case, the veth peer that OVN cares about is the other end of a pair 
> that
> is in a namespace it already knows about.
> So, we are not trying to resolve an arbitrary ifindex from anywhere in the 
> system
> but we are finding the peer of a port OVN already owns.
> In practice, the local-OVS-port to peer_ifindex mapping is sufficient for OVN
> to do if_indextoname() (or netlink RTM_GETLINK by ifindex) in the relevant
> namespace.
> 
> Since the addition of a new peer_netnsid status key (similar to peer_ifindex)
> is most likely going to be just an additive change, I decided to keep the 
> patch
> minimal since it already requires moving around quite a lot of stuff already.
> 
> All things considered, I would prefer this addition to be a follow-up in the 
> future,
> if you don't mind.

Sounds good.  If the ifindex alone suffice for what OVN needs, then the nsid
can be added as a follow up later, if needed.

> 
>     > + * instead, causing OVS to silently reject them.  Some container 
> runtimes
>     > + * create such a mapping as a side-effect of cross-namespace link 
> queries.
>     > + *
>     > + * To make this deterministic, we create the mapping ourselves (or 
> read back
>     > + * the existing one) at startup, before the notification socket exists 
> and
>     > + * record the resulting nsid as local.  A self-referential nsid 
> mapping is
>     > + * permanent once created, the kernel only removes nsid entries when 
> the
>     > + * peer namespace is destroyed, which can never happen for our own 
> namespace.
>     > + * Therefore, the value never changes and no runtime monitoring is 
> required.
>     > + *
>     > + * The namespace is identified by our own pid (NETNSA_PID).
>     > + *
>     > + * This kernel behavior was fixed in commit 88b126b39f97 ("net: 
> netlink: don't
>     > + * set nsid on local notifications"), which stops tagging local events 
> with the
>     > + * self-referential nsid.  This workaround (and the self-nsid check in
>     > + * netnsid_is_local()) is only needed for older kernels that lack that 
> fix and
>     > + * can be removed once such kernels are no longer supported. */
>     > +static void
>     > +netdev_linux_init_self_nsid(void)
>     > +{
>     > +#ifdef HAVE_LINUX_NET_NAMESPACE_H
>     > +    const int rta_offset = NLMSG_ALIGN(sizeof(struct rtgenmsg));
>     > +    struct ofpbuf request;
>     > +    struct ofpbuf *reply = NULL;
>     > +    uint32_t pid = getpid();
>     > +    int error;
>     > +
>     > +    /* Create a self-referential nsid mapping for our own namespace.  
> Passing
>     > +     * NETNSA_NSID == -1 lets the kernel allocate a free id. If the 
> mapping
>     > +     * already exists the request fails with EEXIST, which is 
> harmless. */
>     > +    ofpbuf_init(&request, 0);
>     > +    nl_msg_put_nlmsghdr(&request,
>     > +                        rta_offset + 2 * 
> NL_ATTR_SIZE(sizeof(uint32_t)),
>     > +                        RTM_NEWNSID, NLM_F_REQUEST | NLM_F_ACK);
>     > +    ofpbuf_put_zeros(&request, rta_offset);
>     > +    nl_msg_put_u32(&request, NETNSA_PID, pid);
>     > +    nl_msg_put_u32(&request, NETNSA_NSID, NETNSID_LOCAL);
>     > +    nl_transact(NETLINK_ROUTE, &request, NULL);
>     > +    ofpbuf_uninit(&request);
>     > +
>     > +    /* Read back the nsid that the kernel assigned to our namespace. */
>     > +    ofpbuf_init(&request, 0);
>     > +    nl_msg_put_nlmsghdr(&request, rta_offset + 
> NL_ATTR_SIZE(sizeof(uint32_t)),
>     > +                        RTM_GETNSID, NLM_F_REQUEST);
>     > +    ofpbuf_put_zeros(&request, rta_offset);
>     > +    nl_msg_put_u32(&request, NETNSA_PID, pid);
>     > +
>     > +    error = nl_transact(NETLINK_ROUTE, &request, &reply);
>     > +    if (!error && reply) {
>     > +        const struct nlattr *a;
>     > +
>     > +        a = nl_attr_find(reply, NLMSG_HDRLEN + rta_offset, 
> NETNSA_NSID);
>     > +        if (a) {
>     > +            int nsid = nl_attr_get_u32(a);
>     > +
>     > +            if (nsid >= 0) {
>     > +                netnsid_set_self(nsid);
>     > +                VLOG_DBG("local network namespace has nsid %d", nsid);
>     > +            }
>     > +        }
>     > +    }
> 
>     Should we maybe log a warning here on the 'else' branch?
> 
> 
> Good idea. I do not expect this to fail, except for some very strange edge 
> cases but
> adding more diagnostic info is always helpful.
> I will include a vlog_warn if the local network namespace nsid cannot be 
> queried
> successfully.
> 
>     > ---
>     >  lib/automake.mk <http://automake.mk>    |  1 +
>     >  lib/netdev-linux.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
>     >  lib/netnsid.c      | 17 +++++++++
>     >  lib/netnsid.h      | 12 +++++-
>     >  4 files changed, 121 insertions(+), 1 deletion(-)
>     >  create mode 100644 lib/netnsid.c
>     >
>     > diff --git a/lib/automake.mk <http://automake.mk> b/lib/automake.mk 
> <http://automake.mk>
>     > index 9b0f6fe0a..2dfca79da 100644
>     > --- a/lib/automake.mk <http://automake.mk>
>     > +++ b/lib/automake.mk <http://automake.mk>
>     > @@ -168,6 +168,7 @@ lib_libopenvswitch_la_SOURCES = \
>     >       lib/netflow.h \
>     >       lib/netlink.c \
>     >       lib/netlink.h \
>     > +     lib/netnsid.c \
>     >       lib/netnsid.h \
>     >       lib/nx-match.c \
>     >       lib/nx-match.h \
>     > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>     > index c694dc1c5..d7447fa06 100644
>     > --- a/lib/netdev-linux.c
>     > +++ b/lib/netdev-linux.c
>     > @@ -586,6 +586,84 @@ is_tap_netdev(const struct netdev *netdev)
>     >      return netdev_get_class(netdev) == &netdev_tap_class;
>     >  }
>     > 
>     > +/* Ensures that our own network namespace has a self-referential nsid 
> mapping
>     > + * and records its value through netnsid_set_self() so that 
> netnsid_is_local()
>     > + * treats it as the local namespace.
>     > + *
>     > + * NETLINK_LISTEN_ALL_NSID workaround: OVS enables this option on its 
> RTNL
>     > + * notification socket so that it can receive events from remote 
> namespaces.
>     > + * A side-effect of this option is that the kernel tags every broadcast
>     > + * (including locally-originated RTM events) with the sender nsid as 
> looked up
>     > + * in the receiver nsid table.  Normally local events carry no nsid 
> cmsg
>     > + * (which OVS interprets as NETNSID_LOCAL), but if a self-referential 
> nsid
>     > + * mapping exists for the local namespace the kernel tags them with 
> that nsid
> 
>     I'd say 'an older kernel may tag them'.
> 
>     > +
>     > +    ofpbuf_uninit(&request);
>     > +    ofpbuf_delete(reply);
>     > +#endif
>     > +}
>     > +
>     >  static int
>     >  netdev_linux_netnsid_update__(struct netdev_linux *netdev)
>     >  {
>     > @@ -665,6 +743,11 @@ netdev_linux_notify_sock(void)
>     >      if (ovsthread_once_start(&once)) {
>     >          int error;
>     > 
>     > +        /* Discover (creating it if necessary) the nsid that the 
> kernel uses
>     > +         * for our own namespace, before the notification socket starts
>     > +         * receiving namespace-tagged events. */
>     > +        netdev_linux_init_self_nsid();
>     > +
>     >          error = nl_sock_create(NETLINK_ROUTE, &sock);
>     >          if (!error) {
>     >              size_t i;
>     > @@ -936,6 +1019,15 @@ netdev_linux_update(struct netdev_linux *dev, int 
> nsid,
>     >                      const struct rtnetlink_change *change)
>     >      OVS_REQUIRES(dev->mutex)
>     >  {
>     > +    /* NETLINK_LISTEN_ALL_NSID workaround: the kernel may tag locally
>     > +     * originated events with our own self-referential nsid instead of
>     > +     * omitting the cmsg.  netnsid_is_local() knows about that 
> self-nsid (set
>     > +     * once at startup, see netdev_linux_init_self_nsid()), so 
> normalize such
>     > +     * events back to the local namespace id before matching. */
>     > +    if (netnsid_is_local(nsid)) {
>     > +        netnsid_set_local(&nsid);
>     > +    }
> 
>     I think, we should not do that.  The netnsid module is aware that local
>     is not necessarily -1, so the equality check function in that module
>     should take that into account, i.e.:
> 
>         if (netnsid_is_local(nsid1) && netnsid_is_local(nsid2)) {
>             return true;
>         }
> 
>     > +
>     >      if (netdev_linux_netnsid_is_eq(dev, nsid)) {
>     >          netdev_linux_update__(dev, change);
>     >      }
>     > diff --git a/lib/netnsid.c b/lib/netnsid.c
>     > new file mode 100644
>     > index 000000000..043637e12
>     > --- /dev/null
>     > +++ b/lib/netnsid.c
>     > @@ -0,0 +1,17 @@
> 
>     This file needs a copyright/license header.
> 
>     > +#include <config.h>
>     > +#include "netnsid.h"
>     > +
>     > +/* The nsid that the local kernel assigns to our own network namespace 
> when a
>     > + * self-referential mapping exists, or NETNSID_LOCAL when none does.
>     > + *
>     > + * It is set once at startup by the platform-specific netdev code (see
>     > + * netdev-linux.c) before any namespace-tagged netlink event can be 
> received,
>     > + * and is read afterwards by netnsid_is_local().  A self-referential 
> nsid
>     > + * mapping is permanent once created, so this value never changes 
> again. */
> 
>     This comment duplicates one in the header.  We may just drop it, I think.
> 
> 
> I agree, sorry for missing these.  They will all be all addressed in the next 
> iteration.
> I will also add a reference about the peer_ifindex addition to the NEWS file.
> 
>     > +int netnsid_self = NETNSID_LOCAL;
>     > +
>     > +void
>     > +netnsid_set_self(int nsid)
>     > +{
>     > +    netnsid_self = nsid;
>     > +}
>     > diff --git a/lib/netnsid.h b/lib/netnsid.h
>     > index 1d5ab83c5..0f558c79c 100644
>     > --- a/lib/netnsid.h
>     > +++ b/lib/netnsid.h
>     > @@ -19,6 +19,8 @@
>     > 
>     >  #include <stdbool.h>
>     > 
>     > +#include "util.h"
>     > +
> 
>     Is this needed?
> 
> 
> util.h is included for ovs_assert, which is called in the netnsid_set() 
> inline function further
> down the header:
> Before this #include, the header only worked because every existing includer
> (e.g. netdev-linux.c) happened to include util.h first.
> When the new netnsid.c was added the ovs_assert came up as an implicit 
> declaration
> warning.  Adding #include "util.h" here makes the header self-contained.

OK.  Makes sense.  Maybe note that in the commit message.

> 
> I will send the revised patchset shortly, unless you have any objections to 
> these comments.

Sounds good to me.  Thanks!

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to