The primary motivation for the need for this flag is container runtimes
which have to interact with malicious root filesystems in the host
namespaces. One of the first requirements for a container runtime to be
secure against a malicious rootfs is that they correctly scope symlinks
(that is, they should be scoped as though they are chroot(2)ed into the
container's rootfs) and ".."-style paths[*]. The already-existing
LOOKUP_XDEV and LOOKUP_NO_MAGICLINKS help defend against other potential
attacks in a malicious rootfs scenario.

Currently most container runtimes try to do this resolution in
userspace[1], causing many potential race conditions. In addition, the
"obvious" alternative (actually performing a {ch,pivot_}root(2))
requires a fork+exec (for some runtimes) which is *very* costly if
necessary for every filesystem operation involving a container.

[*] At the moment, ".." and magic-link jumping are disallowed for the
    same reason it is disabled for LOOKUP_BENEATH -- currently it is not
    safe to allow it. Future patches may enable it unconditionally once
    we have resolved the possible races (for "..") and semantics (for
    magic-link jumping).

The most significant *at(2) semantic change with LOOKUP_IN_ROOT is that
absolute pathnames no longer cause dirfd to be ignored completely. The
rationale is that LOOKUP_IN_ROOT must necessarily chroot-scope symlinks
with absolute paths to dirfd, and so doing it for the base path seems to
be the most consistent behaviour (and also avoids foot-gunning users who
want to scope paths that are absolute).

[1]: https://github.com/cyphar/filepath-securejoin

Co-developed-by: Christian Brauner <christ...@brauner.io>
Signed-off-by: Aleksa Sarai <cyp...@cyphar.com>
---
 fs/namei.c            | 6 +++---
 include/linux/namei.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9c3ed597466b..ff016b9e9082 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1149,7 +1149,7 @@ const char *get_link(struct nameidata *nd, bool trailing)
                        if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS))
                                return ERR_PTR(-ELOOP);
                        /* Not currently safe. */
-                       if (unlikely(nd->flags & LOOKUP_BENEATH))
+                       if (unlikely(nd->flags & (LOOKUP_BENEATH | 
LOOKUP_IN_ROOT)))
                                return ERR_PTR(-EXDEV);
                        /*
                         * For trailing_symlink we check whether the symlink's
@@ -1833,7 +1833,7 @@ static inline int handle_dots(struct nameidata *nd, int 
type)
                 * cause our parent to have moved outside of the root and us to 
skip
                 * over it.
                 */
-               if (unlikely(nd->flags & LOOKUP_BENEATH))
+               if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT)))
                        return -EXDEV;
                if (!nd->root.mnt)
                        set_root(nd);
@@ -2384,7 +2384,7 @@ static const char *path_init(struct nameidata *nd, 
unsigned flags)
 
        nd->m_seq = read_seqbegin(&mount_lock);
 
-       if (unlikely(nd->flags & LOOKUP_BENEATH)) {
+       if (unlikely(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) {
                error = dirfd_path_init(nd);
                if (unlikely(error))
                        return ERR_PTR(error);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 7bc819ad0cd3..4b1ee717cb14 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -56,6 +56,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_NO_MAGICLINKS   0x040000 /* No /proc/$pid/fd/ "symlink" 
crossing. */
 #define LOOKUP_NO_SYMLINKS     0x080000 /* No symlink crossing *at all*.
                                            Implies LOOKUP_NO_MAGICLINKS. */
+#define LOOKUP_IN_ROOT         0x100000 /* Treat dirfd as %current->fs->root. 
*/
 
 extern int path_pts(struct path *path);
 
-- 
2.22.0

Reply via email to