On 11.01.2016 20:59, Serge Hallyn wrote: > Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com): >> The safe_mount primitive will mount the fs in the new container >> environment by using file descriptors referred in /proc/self/fd. >> However, when the mounted filesystem is proc itself, it will have >> been previously unmounted, therefore resulting in an error when >> searching for these file descriptors. This only happens when there's >> no container rootfs prefix (commonly with lxc-execute). >> >> Implement the support for this use case as well, by doing the mount >> based on the full path. >> >> Refactor the whole function in order to remove duplicated code checks >> and improve readability. >> >> Changes since v1: >> - In order to address CVE-2015-1335, still check if the destination is >> not a symlink. Do the mount only if the destination file descriptor >> exists. >> >> Signed-off-by: Bogdan Purcareata <bogdan.purcare...@nxp.com> >> --- >> src/lxc/utils.c | 49 ++++++++++++++++++++++++++++--------------------- >> 1 file changed, 28 insertions(+), 21 deletions(-) >> >> diff --git a/src/lxc/utils.c b/src/lxc/utils.c >> index d9e769d..c53711a 100644 >> --- a/src/lxc/utils.c >> +++ b/src/lxc/utils.c >> @@ -1644,9 +1644,9 @@ out: >> int safe_mount(const char *src, const char *dest, const char *fstype, >> unsigned long flags, const void *data, const char *rootfs) >> { >> - int srcfd = -1, destfd, ret, saved_errno; >> + int srcfd = -1, destfd = -1, ret = 0; >> char srcbuf[50], destbuf[50]; // only needs enough for >> /proc/self/fd/<fd> >> - const char *mntsrc = src; >> + const char *mntsrc = src, *mntdest = dest; >> >> if (!rootfs) >> rootfs = ""; >> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, >> const char *fstype, >> if (flags & MS_BIND && src && src[0] != '/') { >> INFO("this is a relative bind mount"); >> srcfd = open_without_symlink(src, NULL); >> - if (srcfd < 0) >> - return srcfd; >> + if (srcfd < 0) { >> + ret = srcfd; >> + goto out; >> + } >> ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd); >> if (ret < 0 || ret > 50) { >> - close(srcfd); >> ERROR("Out of memory"); >> - return -EINVAL; >> + ret = -EINVAL; >> + goto out_src; >> } >> mntsrc = srcbuf; >> } >> >> destfd = open_without_symlink(dest, rootfs); >> if (destfd < 0) { >> - if (srcfd != -1) >> - close(srcfd); >> - return destfd; >> + ret = destfd; >> + goto out_src; >> } >> >> ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd); >> if (ret < 0 || ret > 50) { >> - if (srcfd != -1) >> - close(srcfd); >> - close(destfd); >> ERROR("Out of memory"); >> - return -EINVAL; >> + ret = -EINVAL; >> + goto out_dest; >> } >> >> - ret = mount(mntsrc, destbuf, fstype, flags, data); >> - saved_errno = errno; >> - if (srcfd != -1) >> - close(srcfd); >> - close(destfd); >> + /* make sure the destination descriptor exists */ >> + if (access(destbuf, F_OK) == 0) >> + mntdest = destbuf; > First, if we're going to shortcut I'd prefer to say "if /proc/self > does not exist then skip this check" fo rnow. > But can we think of any way to still do this check? > > What exactly are the cases? > > 1. lxc-execute, lxc-init tries to mount /proc. We should be able > to simply have lxc always mount /proc before the pivot_root, so > we can properly do this check. > > what use-cases will break if we demand /proc to exist in the > container? (We can add an option to umount /proc in lxc-init, > but the directory would have to exist) That's my use case - the failing function is tmp_proc_mount, thus happening before pivot_root, and the scenario is running an application container with lxc-execute. This calls mount_proc_if_needed which is bound to fail, since it will first unmount /proc (because after cloning the process in the new namespaces, its PID will be different than /proc/self), and then try to safe_mount it, depending on /proc/self/fd. However, tmp_proc_mount will not fail, and afterwards lxc-init will mount /proc following the behavior you mentioned. So functionally all is ok, but tmp_proc_mount -> mount_proc_if_needed -> safe_mount will still spit a nasty warning.
BTW this whole thing is with LXC 1.1.5. > > 2. lxc.rootfs unset. In this case we're trusting the *host* admin > to not have messed with /proc to make it a symlink, if we can't > trust that we're out of luck. Other paths are not the same (since > parts of the rootfs could be bind-mounted from container-owned > dirs) so we should check those. But so for this check we should > simply explicitly check for "/proc". Doing a more roundabout > check may leave us open to subtle different attacks. In particular > I imagine there are other ways to make /proc/self/fd/N not > access-able, and you are, in that case, re-introducing the TOCTTOU - > the attacker could try to quickliy insert the symlink after our > checks and before the real mount(). Leaving lxc.rootfs unset results in lxc-start blocking right after lxc_mount_auto_mounts. However, setting it to "/" leads to a successful scenario, since lxc_mount_auto_mounts will mount a temporary /proc in /usr/var/lib/lxc/rootfs, so mount_proc_if_needed will no longer lead to mounting /proc again. So my thought right now is: - leave the open_without_symlink call in order to check that the destination is not a symlink - use the full destination path in mount syscall only if the fstype is proc and the destination path is "/proc" - otherwise use destfd Does this make sense? >> + ret = mount(mntsrc, mntdest, fstype, flags, data); >> if (ret < 0) { >> - errno = saved_errno; >> SYSERROR("Failed to mount %s onto %s", src, dest); >> - return ret; >> + goto out_dest; >> } >> >> - return 0; >> + ret = 0; >> + >> +out_dest: >> + if (destfd > 0) > These should be >= 0 (here and below) right? > True, 0 _is_ a valid file descriptor. Will fix in v3. >> + close(destfd); >> +out_src: >> + if (srcfd > 0) >> + close(srcfd); >> +out: >> + return ret; >> } Thanks for the feedback! Bogdan P. _______________________________________________ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel