Re: [PATCH v2 28/33] nsfs: support file handles

Jan Kara Wed, 17 Sep 2025 22:23:26 -0700

On Fri 12-09-25 13:52:51, Christian Brauner wrote:
> A while ago we added support for file handles to pidfs so pidfds can be
> encoded and decoded as file handles. Userspace has adopted this quickly
> and it's proven very useful. Implement file handles for namespaces as
> well.
> 
> A process is not always able to open /proc/self/ns/. That requires
> procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be
> overmounted. However, userspace can always derive a namespace fd from
> a pidfd. And that always works for a task's own namespace.
> 
> There's no need to introduce unnecessary behavioral differences between
> /proc/self/ns/ fds, pidfd-derived namespace fds, and file-handle-derived
> namespace fds. So namespace file handles are always decodable if the
> caller is located in the namespace the file handle refers to.
> 
> This also allows a task to e.g., store a set of file handles to its
> namespaces in a file on-disk so it can verify when it gets rexeced that
> they're still valid and so on. This is akin to the pidfd use-case.
> 
> Or just plainly for namespace comparison reasons where a file handle to
> the task's own namespace can be easily compared against others.
> 
> Reviewed-by: Amir Goldstein <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>


...

> +     switch (ns->ops->type) {
> +#ifdef CONFIG_CGROUPS
> +     case CLONE_NEWCGROUP:
> +             if (!current_in_namespace(to_cg_ns(ns)))
> +                     owning_ns = to_cg_ns(ns)->user_ns;
> +             break;
> +#endif
> +#ifdef CONFIG_IPC_NS
> +     case CLONE_NEWIPC:
> +             if (!current_in_namespace(to_ipc_ns(ns)))
> +                     owning_ns = to_ipc_ns(ns)->user_ns;
> +             break;
> +#endif
> +     case CLONE_NEWNS:
> +             if (!current_in_namespace(to_mnt_ns(ns)))
> +                     owning_ns = to_mnt_ns(ns)->user_ns;
> +             break;
> +#ifdef CONFIG_NET_NS
> +     case CLONE_NEWNET:
> +             if (!current_in_namespace(to_net_ns(ns)))
> +                     owning_ns = to_net_ns(ns)->user_ns;
> +             break;
> +#endif
> +#ifdef CONFIG_PID_NS
> +     case CLONE_NEWPID:
> +             if (!current_in_namespace(to_pid_ns(ns))) {
> +                     owning_ns = to_pid_ns(ns)->user_ns;
> +             } else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) {
> +                     ns->ops->put(ns);
> +                     return ERR_PTR(-EPERM);
> +             }
> +             break;
> +#endif
> +#ifdef CONFIG_TIME_NS
> +     case CLONE_NEWTIME:
> +             if (!current_in_namespace(to_time_ns(ns)))
> +                     owning_ns = to_time_ns(ns)->user_ns;
> +             break;
> +#endif
> +#ifdef CONFIG_USER_NS
> +     case CLONE_NEWUSER:
> +             if (!current_in_namespace(to_user_ns(ns)))
> +                     owning_ns = to_user_ns(ns);
> +             break;
> +#endif
> +#ifdef CONFIG_UTS_NS
> +     case CLONE_NEWUTS:
> +             if (!current_in_namespace(to_uts_ns(ns)))
> +                     owning_ns = to_uts_ns(ns)->user_ns;
> +             break;
> +#endif

Frankly, switches like these are asking for more Generic usage ;) But ok
for now.

> +     default:
> +             return ERR_PTR(-EOPNOTSUPP);
> +     }
> +
> +     if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) {
> +             ns->ops->put(ns);
> +             return ERR_PTR(-EPERM);
> +     }
> +
> +     /* path_from_stashed() unconditionally consumes the reference. */
> +     ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
> +     if (ret)
> +             return ERR_PTR(ret);
> +
> +     return no_free_ptr(path.dentry);

Ugh, so IMO this is very subtle because we declare

        struct path path __free(path_put)

but then do no_free_ptr(path.dentry). I really had to lookup implementation
of no_free_ptr() to check whether we are leaking mnt reference here or not
(we are not). But that seems as an implementation detail we shouldn't
better rely on? Wouldn't be:

        return dget(path.dentry);

much clearer (and sligthly less efficient, I know, but who cares)?

Otherwise looks good to me so feel free to add:

Reviewed-by: Jan Kara <[email protected]>

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Re: [PATCH v2 28/33] nsfs: support file handles

Reply via email to