On 6/9/26 2:42 PM, Matteo Perin via dev wrote:
> OVS enables NETLINK_LISTEN_ALL_NSID on its RTNL notification socket so
> that it can receive link and address events that originate in other
> network namespaces.  A side effect of this option is that the kernel
> tags every broadcast notification with the sender's nsid as looked up in
> the receiver's nsid table.
> 
> Normally, events that originate in the local namespace carry no nsid
> ancillary data, which OVS interprets as NETNSID_LOCAL.  However, if a
> self-referential nsid mapping exists for the local namespace, the kernel
> tags local events with that nsid instead.  Such a mapping can be created
> by other processes on the system (for example as a side effect of
> cross-namespace link queries by some container runtimes), at which point
> OVS starts treating its own local events as coming from a remote
> namespace and silently ignores them.
> 
> To handle this deterministically, create a self-referential nsid mapping
> for the local namespace at startup, before the notification socket is set
> up and record the assigned id.  The namespace is identified by our own
> pid (NETNSA_PID).  A self-referential nsid mapping is permanent once
> created: the kernel only removes nsid entries when the peer namespace
> is destroyed, which can never happen for our own namespace.  So, the
> value never changes and no runtime monitoring is required.
> 
> The recorded self nsid is stored in a global variable accessed through
> netnsid_is_local(), so that any RTM event tagged with either the absence
> of an nsid or our own self nsid is treated as local and processed
> normally.
> 
> This kernel behavior was fixed in commit 88b126b39f97 ("net: netlink:
> don't set nsid on local notifications"), which stops tagging local
> events with the self-referential nsid.  This workaround is only needed
> for older kernels that lack that fix and can be removed once such kernels
> are no longer supported.
> 
> Signed-off-by: Matteo Perin <[email protected]>

Thanks, Matteo!  The patch looks good in general.  Just a few comments
below.  The rest of the set looks good as-is.

One thought I had though: Do we also maybe need to report nsid of the
peer?  ifindex is technically not enough to identify the interface in
the system.  May be a follow up change though.

> ---
>  lib/automake.mk    |  1 +
>  lib/netdev-linux.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
>  lib/netnsid.c      | 17 +++++++++
>  lib/netnsid.h      | 12 +++++-
>  4 files changed, 121 insertions(+), 1 deletion(-)
>  create mode 100644 lib/netnsid.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 9b0f6fe0a..2dfca79da 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -168,6 +168,7 @@ lib_libopenvswitch_la_SOURCES = \
>       lib/netflow.h \
>       lib/netlink.c \
>       lib/netlink.h \
> +     lib/netnsid.c \
>       lib/netnsid.h \
>       lib/nx-match.c \
>       lib/nx-match.h \
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index c694dc1c5..d7447fa06 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -586,6 +586,84 @@ is_tap_netdev(const struct netdev *netdev)
>      return netdev_get_class(netdev) == &netdev_tap_class;
>  }
>  
> +/* Ensures that our own network namespace has a self-referential nsid mapping
> + * and records its value through netnsid_set_self() so that 
> netnsid_is_local()
> + * treats it as the local namespace.
> + *
> + * NETLINK_LISTEN_ALL_NSID workaround: OVS enables this option on its RTNL
> + * notification socket so that it can receive events from remote namespaces.
> + * A side-effect of this option is that the kernel tags every broadcast
> + * (including locally-originated RTM events) with the sender nsid as looked 
> up
> + * in the receiver nsid table.  Normally local events carry no nsid cmsg
> + * (which OVS interprets as NETNSID_LOCAL), but if a self-referential nsid
> + * mapping exists for the local namespace the kernel tags them with that nsid

I'd say 'an older kernel may tag them'.

> + * instead, causing OVS to silently reject them.  Some container runtimes
> + * create such a mapping as a side-effect of cross-namespace link queries.
> + *
> + * To make this deterministic, we create the mapping ourselves (or read back
> + * the existing one) at startup, before the notification socket exists and
> + * record the resulting nsid as local.  A self-referential nsid mapping is
> + * permanent once created, the kernel only removes nsid entries when the
> + * peer namespace is destroyed, which can never happen for our own namespace.
> + * Therefore, the value never changes and no runtime monitoring is required.
> + *
> + * The namespace is identified by our own pid (NETNSA_PID).
> + *
> + * This kernel behavior was fixed in commit 88b126b39f97 ("net: netlink: 
> don't
> + * set nsid on local notifications"), which stops tagging local events with 
> the
> + * self-referential nsid.  This workaround (and the self-nsid check in
> + * netnsid_is_local()) is only needed for older kernels that lack that fix 
> and
> + * can be removed once such kernels are no longer supported. */
> +static void
> +netdev_linux_init_self_nsid(void)
> +{
> +#ifdef HAVE_LINUX_NET_NAMESPACE_H
> +    const int rta_offset = NLMSG_ALIGN(sizeof(struct rtgenmsg));
> +    struct ofpbuf request;
> +    struct ofpbuf *reply = NULL;
> +    uint32_t pid = getpid();
> +    int error;
> +
> +    /* Create a self-referential nsid mapping for our own namespace.  Passing
> +     * NETNSA_NSID == -1 lets the kernel allocate a free id. If the mapping
> +     * already exists the request fails with EEXIST, which is harmless. */
> +    ofpbuf_init(&request, 0);
> +    nl_msg_put_nlmsghdr(&request,
> +                        rta_offset + 2 * NL_ATTR_SIZE(sizeof(uint32_t)),
> +                        RTM_NEWNSID, NLM_F_REQUEST | NLM_F_ACK);
> +    ofpbuf_put_zeros(&request, rta_offset);
> +    nl_msg_put_u32(&request, NETNSA_PID, pid);
> +    nl_msg_put_u32(&request, NETNSA_NSID, NETNSID_LOCAL);
> +    nl_transact(NETLINK_ROUTE, &request, NULL);
> +    ofpbuf_uninit(&request);
> +
> +    /* Read back the nsid that the kernel assigned to our namespace. */
> +    ofpbuf_init(&request, 0);
> +    nl_msg_put_nlmsghdr(&request, rta_offset + 
> NL_ATTR_SIZE(sizeof(uint32_t)),
> +                        RTM_GETNSID, NLM_F_REQUEST);
> +    ofpbuf_put_zeros(&request, rta_offset);
> +    nl_msg_put_u32(&request, NETNSA_PID, pid);
> +
> +    error = nl_transact(NETLINK_ROUTE, &request, &reply);
> +    if (!error && reply) {
> +        const struct nlattr *a;
> +
> +        a = nl_attr_find(reply, NLMSG_HDRLEN + rta_offset, NETNSA_NSID);
> +        if (a) {
> +            int nsid = nl_attr_get_u32(a);
> +
> +            if (nsid >= 0) {
> +                netnsid_set_self(nsid);
> +                VLOG_DBG("local network namespace has nsid %d", nsid);
> +            }
> +        }
> +    }

Should we maybe log a warning here on the 'else' branch?

> +
> +    ofpbuf_uninit(&request);
> +    ofpbuf_delete(reply);
> +#endif
> +}
> +
>  static int
>  netdev_linux_netnsid_update__(struct netdev_linux *netdev)
>  {
> @@ -665,6 +743,11 @@ netdev_linux_notify_sock(void)
>      if (ovsthread_once_start(&once)) {
>          int error;
>  
> +        /* Discover (creating it if necessary) the nsid that the kernel uses
> +         * for our own namespace, before the notification socket starts
> +         * receiving namespace-tagged events. */
> +        netdev_linux_init_self_nsid();
> +
>          error = nl_sock_create(NETLINK_ROUTE, &sock);
>          if (!error) {
>              size_t i;
> @@ -936,6 +1019,15 @@ netdev_linux_update(struct netdev_linux *dev, int nsid,
>                      const struct rtnetlink_change *change)
>      OVS_REQUIRES(dev->mutex)
>  {
> +    /* NETLINK_LISTEN_ALL_NSID workaround: the kernel may tag locally
> +     * originated events with our own self-referential nsid instead of
> +     * omitting the cmsg.  netnsid_is_local() knows about that self-nsid (set
> +     * once at startup, see netdev_linux_init_self_nsid()), so normalize such
> +     * events back to the local namespace id before matching. */
> +    if (netnsid_is_local(nsid)) {
> +        netnsid_set_local(&nsid);
> +    }

I think, we should not do that.  The netnsid module is aware that local
is not necessarily -1, so the equality check function in that module
should take that into account, i.e.:

    if (netnsid_is_local(nsid1) && netnsid_is_local(nsid2)) {
        return true;
    }

> +
>      if (netdev_linux_netnsid_is_eq(dev, nsid)) {
>          netdev_linux_update__(dev, change);
>      }
> diff --git a/lib/netnsid.c b/lib/netnsid.c
> new file mode 100644
> index 000000000..043637e12
> --- /dev/null
> +++ b/lib/netnsid.c
> @@ -0,0 +1,17 @@

This file needs a copyright/license header.

> +#include <config.h>
> +#include "netnsid.h"
> +
> +/* The nsid that the local kernel assigns to our own network namespace when a
> + * self-referential mapping exists, or NETNSID_LOCAL when none does.
> + *
> + * It is set once at startup by the platform-specific netdev code (see
> + * netdev-linux.c) before any namespace-tagged netlink event can be received,
> + * and is read afterwards by netnsid_is_local().  A self-referential nsid
> + * mapping is permanent once created, so this value never changes again. */

This comment duplicates one in the header.  We may just drop it, I think.

> +int netnsid_self = NETNSID_LOCAL;
> +
> +void
> +netnsid_set_self(int nsid)
> +{
> +    netnsid_self = nsid;
> +}
> diff --git a/lib/netnsid.h b/lib/netnsid.h
> index 1d5ab83c5..0f558c79c 100644
> --- a/lib/netnsid.h
> +++ b/lib/netnsid.h
> @@ -19,6 +19,8 @@
>  
>  #include <stdbool.h>
>  
> +#include "util.h"
> +

Is this needed?

>  #ifdef HAVE_LINUX_NET_NAMESPACE_H
>  #include <linux/net_namespace.h>
>  #endif
> @@ -67,6 +69,14 @@
>  #endif
>  #define NETNSID_UNSET (NETNSID_LOCAL - 1)
>  
> +/* The nsid that the local kernel assigns to our own network namespace when a
> + * self-referential mapping exists, or NETNSID_LOCAL otherwise.  It is 
> defined
> + * in netnsid.c and set once at startup by the platform-specific netdev code
> + * (see netdev-linux.c).  netnsid_is_local() treats this value as equivalent
> + * to the local namespace. */
> +extern int netnsid_self;
> +void netnsid_set_self(int nsid);
> +
>  /* Prototypes */
>  static inline void netnsid_set_local(int *nsid);
>  static inline bool netnsid_is_local(int nsid);
> @@ -86,7 +96,7 @@ netnsid_set_local(int *nsid)
>  static inline bool
>  netnsid_is_local(int nsid)
>  {
> -    return nsid == NETNSID_LOCAL;
> +    return nsid == NETNSID_LOCAL || nsid == netnsid_self;
>  }
>  
>  static inline void

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to