On 2025-09-12, Christian Brauner <[email protected]> wrote: > On Thu, Sep 11, 2025 at 01:36:28PM +0200, Amir Goldstein wrote: > > On Thu, Sep 11, 2025 at 11:31 AM Christian Brauner <[email protected]> > > wrote: > > > > > > On Wed, Sep 10, 2025 at 07:21:22PM +0200, Amir Goldstein wrote: > > > > On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner <[email protected]> > > > > wrote: > > > > > > > > > > A while ago we added support for file handles to pidfs so pidfds can > > > > > be > > > > > encoded and decoded as file handles. Userspace has adopted this > > > > > quickly > > > > > and it's proven very useful. > > > > > > > > > Pidfd file handles are exhaustive meaning > > > > > they don't require a handle on another pidfd to pass to > > > > > open_by_handle_at() so it can derive the filesystem to decode in. > > > > > > > > > > Implement the exhaustive file handles for namespaces as well. > > > > > > > > I think you decide to split the "exhaustive" part to another patch, > > > > so better drop this paragraph? > > > > > > Yes, good point. I've dont that. > > > > > > > I am missing an explanation about the permissions for > > > > opening these file handles. > > > > > > > > My understanding of the code is that the opener needs to meet one of > > > > the conditions: > > > > 1. user has CAP_SYS_ADMIN in the userns owning the opened namespace > > > > 2. current task is in the opened namespace > > > > > > Yes. > > > > > > > > > > > But I do not fully understand the rationale behind the 2nd condition, > > > > that is, when is it useful? > > > > > > A caller is always able to open a file descriptor to it's own set of > > > namespaces. File handles will behave the same way. > > > > > > > I understand why it's safe, and I do not object to it at all, > > I just feel that I do not fully understand the use case of how ns file > > handles > > are expected to be used. > > A process can always open /proc/self/ns/mnt > > What's the use case where a process may need to open its own ns by handle? > > > > I will explain. For CAP_SYS_ADMIN I can see why keeping handles that > > do not keep an elevated refcount of ns object could be useful in the same > > way that an NFS client keeps file handles without keeping the file object > > alive. > > > > But if you do not have CAP_SYS_ADMIN and can only open your own ns > > by handle, what is the application that could make use of this? > > and what's the benefit of such application keeping a file handle instead of > > ns fd? > > A process is not always able to open /proc/self/ns/. That requires > procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be > overmounted. However, they can derive a namespace fd from their own > pidfd. And that also always works if it's their own namespace.
It's also important to note that if /proc/self and /proc/thread-self are overmounted, you can get into scenarios where /proc/$pid will refer to the wrong process (container runtimes run into this scenario a lot -- when configuring a container there is a point where we are in a new pidns but still see the host /proc, which leads to lots of fun bugs). > There's no need to introduce unnecessary behavioral differences between > /proc/self/ns/, pidfd-derived namespace fs, and file-handle-derived > namespace fds. That's just going to be confusing. > > The other thing is that there are legitimate use-case for encoding your > own namespace. For example, you might store file handles to your set of > namespaces in a file on-disk so you can verify when you get rexeced that > they're still valid and so on. This is akin to the pidfd use-case. > > Or just plainly for namespace comparison reasons where you keep a file > handle to your own namespaces and can then easily check against others. I agree wholeheartedly. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
signature.asc
Description: PGP signature
