On Thu, Nov 06, 2025 at 06:03:10PM -0800, Bobby Eshleman wrote:
On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
> From: Bobby Eshleman <[email protected]>
>
> Add netns logic to vsock core. Additionally, modify transport hook
> prototypes to be used by later transport-specific patches (e.g.,
> *_seqpacket_allow()).
>
> Namespaces are supported primarily by changing socket lookup functions
> (e.g., vsock_find_connected_socket()) to take into account the socket
> namespace and the namespace mode before considering a candidate socket a
> "match".
>
> Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
> used by transports that do not support namespacing. This dummy always
> has mode "global" to preserve previous CID behavior.
>
> This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
> accepts the "global" or "local" mode strings.
>
> The transports (besides vhost) are modified to use the global dummy,
> which makes them behave as if always in the global namespace. Vhost is
> an exception because it inherits its namespace from the process that
> opens the vhost device.
>
> Add netns functionality (initialization, passing to transports, procfs,
> etc...) to the af_vsock socket layer. Later patches that add netns
> support to transports depend on this patch.
>
> seqpacket_allow() callbacks are modified to take a vsk so that transport
> implementations can inspect sock_net(sk) and vsk->net_mode when performing
> lookups (e.g., vhost does this in its future netns patch). Because the
> API change affects all transports, it seemed more appropriate to make
> this internal API change in the "vsock core" patch then in the "vhost"
> patch.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> Changes in v7:
> - hv_sock: fix hyperv build error
> - explain why vhost does not use the dummy
> - explain usage of __vsock_global_dummy_net
> - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> - use switch-case in vsock_net_mode_string()
> - avoid changing transports as much as possible
> - add vsock_find_{bound,connected}_socket_net()
> - rename `vsock_hdr` to `sysctl_hdr`
> - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
>  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> - explain seqpacket_allow() change
> - move net setting to __vsock_create() instead of vsock_create() so
>  that child sockets also have their net assigned upon accept()
>
> Changes in v6:
> - unregister sysctl ops in vsock_exit()
> - af_vsock: clarify description of CID behavior
> - af_vsock: fix buf vs buffer naming, and length checking
> - af_vsock: fix length checking w/ correct ctl_table->maxlen
>
> Changes in v5:
> - vsock_global_net() -> vsock_global_dummy_net()
> - update comments for new uAPI
> - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> - add prototype changes so patch remains compilable
> ---
> drivers/vhost/vsock.c            |   4 +-
> include/linux/virtio_vsock.h     |  21 ++++
> include/net/af_vsock.h           |  14 ++-
> net/vmw_vsock/af_vsock.c         | 264 ++++++++++++++++++++++++++++++++++++---
> net/vmw_vsock/virtio_transport.c |   7 +-
> net/vmw_vsock/vsock_loopback.c   |   4 +-
> 6 files changed, 288 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ae01457ea2cd..34adf0cf9124 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
>    return true;
> }
>
> -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 
remote_cid);
>
> static struct virtio_transport vhost_transport = {
>    .transport = {
> @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
>    .send_pkt = vhost_transport_send_pkt,
> };
>
> -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 
remote_cid)
> {
>    struct vhost_vsock *vsock;
>    bool seqpacket_allow = false;
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 7f334a32133c..29290395054c 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct 
sk_buff *skb,
>    VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
> }
>
> +static inline struct sk_buff *
> +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
> +{
> +  struct sk_buff *skb;
> +
> +  skb = virtio_vsock_alloc_linear_skb(size, mask);
> +  if (!skb)
> +          return NULL;
> +
> +  memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
> +
> +  /* virtio-vsock does not yet support namespaces, so on receive
> +   * we force legacy namespace behavior using the global dummy net
> +   * and global net mode.
> +   */
> +  virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
> +  virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
> +
> +  return skb;
> +}

Why we are introducing this change in this patch?

Where the net of the virtio's skb is read?


Oh good point, this is a weird place for this. I'll move this to where
it is actually used.

[...]

>
> +static int vsock_net_mode_string(const struct ctl_table *table, int write,
> +                           void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +  char data[VSOCK_NET_MODE_STR_MAX] = {0};
> +  enum vsock_net_mode mode;
> +  struct ctl_table tmp;
> +  struct net *net;
> +  int ret;
> +
> +  if (!table->data || !table->maxlen || !*lenp) {
> +          *lenp = 0;
> +          return 0;
> +  }
> +
> +  net = current->nsproxy->net_ns;
> +  tmp = *table;
> +  tmp.data = data;
> +
> +  if (!write) {
> +          const char *p;
> +
> +          mode = vsock_net_mode(net);
> +
> +          switch (mode) {
> +          case VSOCK_NET_MODE_GLOBAL:
> +                  p = VSOCK_NET_MODE_STR_GLOBAL;
> +                  break;
> +          case VSOCK_NET_MODE_LOCAL:
> +                  p = VSOCK_NET_MODE_STR_LOCAL;
> +                  break;
> +          default:
> +                  WARN_ONCE(true, "netns has invalid vsock mode");
> +                  *lenp = 0;
> +                  return 0;
> +          }
> +
> +          strscpy(data, p, sizeof(data));
> +          tmp.maxlen = strlen(p);
> +  }
> +
> +  ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
> +  if (ret)
> +          return ret;
> +
> +  if (write) {

Do we need to check some capability, e.g. CAP_NET_ADMIN ?


We get that for free via the sysctl_net registration, through this path
on open (CAP_NET_ADMIN is checked in net_ctl_permissions):

        net_ctl_permissions+1
        sysctl_perm+24
        proc_sys_permission+117
        inode_permission+217
        link_path_walk+162
        path_openat+152
        do_filp_open+171
        do_sys_openat2+98
        __x64_sys_openat+69
        do_syscall_64+93

Verified with:

cp /bin/echo /tmp/echo_netadmin
setcap cap_net_admin+ep /tmp/echo_netadmin

(non-root user fails with regular echo, succeeds with
/tmp/echo_netadmin)

Thanks for checking!

Stefano


Reply via email to