On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman <[email protected]>

To improve the security posture of vsock namespacing, this patch locks
down the vsock child_ns_mode sysctl setting with a write-once policy.
The user may write to child_ns_mode only once in each namespace, making
changes to either local or global mode be irreversible.

This avoids security breaches where a process in a local namespace may
attempt to jailbreak into the global vsock ns space by setting
child_ns_mode to "global", creating a new namespace, and accessing the
global space through the new namespace.

Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local to global") should avoid exactly that, so I don't get this. Can you elaborate more how this can happen without this patch?

I think here we should talk more about what we described in https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ which is that two administrator processes could compete in setting `child_ns_mode` and end up creating a namespace with an `ns_mode` different from the one set in `child_ns_mode`. But I would also explain that this can still be detected by the process by looking at `ns_mode` and trying again. With this patch, we avoid this and allow the namespace manager to set it once and be sure that it cannot be changed again.


Additionally, fix the test functions that this change would otherwise
break by adding "global-parent" and "local-parent" namespaces and using
them as intermediaries to spawn namespaces in the given modes. This
avoids the need to change "child_ns_mode" in the init_ns. nsenter must
be used because ip netns unshares the mount namespace so nested "ip
netns add" breaks exec calls from the init ns.

I'm not sure what the policy is in netdev, but I would prefer to have selftest changes in another patch (I think earlier in the series so as not to break the bisection), in order to simplify backporting (e.g. in CentOS Stream, to keep the backport small, I didn't backport the dozens of patches for selftest that we did previously).

Obviously, if it's not possible and breaks the bisection, I can safely skip these changes during the backport.


Test run:

1..25
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_host_vsock_ns_mode_ok
ok 5 ns_host_vsock_child_ns_mode_ok
ok 6 ns_global_same_cid_fails
ok 7 ns_local_same_cid_ok
ok 8 ns_global_local_same_cid_ok
ok 9 ns_local_global_same_cid_ok
ok 10 ns_diff_global_host_connect_to_global_vm_ok
ok 11 ns_diff_global_host_connect_to_local_vm_fails
ok 12 ns_diff_global_vm_connect_to_global_host_ok
ok 13 ns_diff_global_vm_connect_to_local_host_fails
ok 14 ns_diff_local_host_connect_to_local_vm_fails
ok 15 ns_diff_local_vm_connect_to_local_host_fails
ok 16 ns_diff_global_to_local_loopback_local_fails
ok 17 ns_diff_local_to_global_loopback_fails
ok 18 ns_diff_local_to_local_loopback_fails
ok 19 ns_diff_global_to_global_loopback_ok
ok 20 ns_same_local_loopback_ok
ok 21 ns_same_local_host_connect_to_local_vm_ok
ok 22 ns_same_local_vm_connect_to_local_host_ok
ok 23 ns_delete_vm_ok
ok 24 ns_delete_host_ok
ok 25 ns_delete_both_ok
SUMMARY: PASS=25 SKIP=0 FAIL=0

IMO this can be removed from the commit message, doesn't add much value other than say that all test passes.


Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Signed-off-by: Bobby Eshleman <[email protected]>
Suggested-by: Daan De Meyer <[email protected]>
Suggested-by: Stefano Garzarella <[email protected]>
---
include/net/af_vsock.h                  |  6 +++++-
include/net/netns/vsock.h               |  1 +
net/vmw_vsock/af_vsock.c                | 10 ++++++----
tools/testing/selftests/vsock/vmtest.sh | 35 +++++++++++++++------------------
4 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..c7de33039907 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct 
vsock_sock *vsk)
        return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
}

-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
                                            enum vsock_net_mode mode)
{
+       if (xchg(&net->vsock.child_ns_mode_locked, 1))
+               return false;
+
        WRITE_ONCE(net->vsock.child_ns_mode, mode);
+       return true;
}

static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..8c855fff8039 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -17,5 +17,6 @@ struct netns_vsock {

        enum vsock_net_mode mode;
        enum vsock_net_mode child_ns_mode;
+       int child_ns_mode_locked;
};
#endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..35e097f4fde8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -90,14 +90,15 @@
 *
 *   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
 *     mode, which is set at namespace creation and immutable thereafter.
- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
 *     child namespaces will inherit when created. The initial value matches
 *     the namespace's own ns_mode.
 *
 *   Changing child_ns_mode only affects newly created namespaces, not the
 *   current namespace or existing children. A "local" namespace cannot set
- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
- *   from the parent's child_ns_mode.
+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
+ *   be configured and locked down by a namespace manager. At namespace
+ *   creation, ns_mode is inherited from the parent's child_ns_mode.

We just merged commit a07c33c6f2fc ("vsock: document namespace mode sysctls") in the net tree, so we should update also Documentation/admin-guide/sysctl/net.rst

 *
 *   The init_net mode is "global" and cannot be modified.

Maybe we should also emphasise in the documentation and in the commit description that `child_ns_mode` in `init_net` also is write-once, so writing `local` to that one by the init process (e.g. systemd), essentially will make all the new namespaces in `local` mode. This could be useful for container/namespace managers.

 *
@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct 
ctl_table *table, int write,
                    new_mode == VSOCK_NET_MODE_GLOBAL)
                        return -EPERM;

-               vsock_net_set_child_mode(net, new_mode);
+               if (!vsock_net_set_child_mode(net, new_mode))
+                       return -EPERM;

So, if `child_ns_mode` is set to `local` but locked, writing `local` again will return -EPERM, is this really what we want?

I'm not sure if we can relax it a bit, but then we may race between reader and writer, so maybe it's fine like it is in this patch, but we should document better that any writes (even same value) after the first one will return -EPERM.

About that, should we return something different, like -EBUSY ?
Not a strong opinion, just to differentiate with the other check before.

Thanks,
Stefano


Reply via email to