On 11.01.2016 20:59, Serge Hallyn wrote:
> Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
>> The safe_mount primitive will mount the fs in the new container
>> environment by using file descriptors referred in /proc/self/fd.
>> However, when the mounted filesystem is proc itself, it will have
>> been previously unmounted, therefore resulting in an error when
>> searching for these file descriptors. This only happens when there's
>> no container rootfs prefix (commonly with lxc-execute).
>>
>> Implement the support for this use case as well, by doing the mount
>> based on the full path.
>>
>> Refactor the whole function in order to remove duplicated code checks
>> and improve readability.
>>
>> Changes since v1:
>> - In order to address CVE-2015-1335, still check if the destination is
>> not a symlink. Do the mount only if the destination file descriptor
>> exists.
>>
>> Signed-off-by: Bogdan Purcareata <bogdan.purcare...@nxp.com>
>> ---
>>   src/lxc/utils.c | 49 ++++++++++++++++++++++++++++---------------------
>>   1 file changed, 28 insertions(+), 21 deletions(-)
>>
>> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
>> index d9e769d..c53711a 100644
>> --- a/src/lxc/utils.c
>> +++ b/src/lxc/utils.c
>> @@ -1644,9 +1644,9 @@ out:
>>   int safe_mount(const char *src, const char *dest, const char *fstype,
>>              unsigned long flags, const void *data, const char *rootfs)
>>   {
>> -    int srcfd = -1, destfd, ret, saved_errno;
>> +    int srcfd = -1, destfd = -1, ret = 0;
>>      char srcbuf[50], destbuf[50]; // only needs enough for 
>> /proc/self/fd/<fd>
>> -    const char *mntsrc = src;
>> +    const char *mntsrc = src, *mntdest = dest;
>>   
>>      if (!rootfs)
>>              rootfs = "";
>> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, 
>> const char *fstype,
>>      if (flags & MS_BIND && src && src[0] != '/') {
>>              INFO("this is a relative bind mount");
>>              srcfd = open_without_symlink(src, NULL);
>> -            if (srcfd < 0)
>> -                    return srcfd;
>> +            if (srcfd < 0) {
>> +                    ret = srcfd;
>> +                    goto out;
>> +            }
>>              ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
>>              if (ret < 0 || ret > 50) {
>> -                    close(srcfd);
>>                      ERROR("Out of memory");
>> -                    return -EINVAL;
>> +                    ret = -EINVAL;
>> +                    goto out_src;
>>              }
>>              mntsrc = srcbuf;
>>      }
>>   
>>      destfd = open_without_symlink(dest, rootfs);
>>      if (destfd < 0) {
>> -            if (srcfd != -1)
>> -                    close(srcfd);
>> -            return destfd;
>> +            ret = destfd;
>> +            goto out_src;
>>      }
>>   
>>      ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
>>      if (ret < 0 || ret > 50) {
>> -            if (srcfd != -1)
>> -                    close(srcfd);
>> -            close(destfd);
>>              ERROR("Out of memory");
>> -            return -EINVAL;
>> +            ret = -EINVAL;
>> +            goto out_dest;
>>      }
>>   
>> -    ret = mount(mntsrc, destbuf, fstype, flags, data);
>> -    saved_errno = errno;
>> -    if (srcfd != -1)
>> -            close(srcfd);
>> -    close(destfd);
>> +    /* make sure the destination descriptor exists */
>> +    if (access(destbuf, F_OK) == 0)
>> +            mntdest = destbuf;
> First, if we're going to shortcut I'd prefer to say "if /proc/self
> does not exist then skip this check" fo rnow.
> But can we think of any way to still do this check?
>
> What exactly are the cases?
>
> 1. lxc-execute, lxc-init tries to mount /proc.  We should be able
> to simply have lxc always mount /proc before the pivot_root, so
> we can properly do this check.
>
> what use-cases will break if we demand /proc to exist in the
> container?  (We can add an option to umount /proc in lxc-init,
> but the directory would have to exist)
That's my use case - the failing function is tmp_proc_mount, thus happening 
before pivot_root, and the scenario is running an application container with 
lxc-execute. This calls mount_proc_if_needed which is bound to fail, since it 
will first unmount /proc (because after cloning the process in the new 
namespaces, its PID will be different than /proc/self), and then try to 
safe_mount it, depending on /proc/self/fd. However, tmp_proc_mount will not 
fail, and afterwards lxc-init will mount /proc following the behavior you 
mentioned. So functionally all is ok, but tmp_proc_mount -> 
mount_proc_if_needed 
-> safe_mount will still spit a nasty warning.

BTW this whole thing is with LXC 1.1.5.
>
> 2. lxc.rootfs unset.  In this case we're trusting the *host* admin
> to not have messed with /proc to make it a symlink, if we can't
> trust that we're out of luck.  Other paths are not the same (since
> parts of the rootfs could be bind-mounted from container-owned
> dirs) so we should check those.  But so for this check we should
> simply explicitly check for "/proc".  Doing a more roundabout
> check may leave us open to subtle different attacks.  In particular
> I imagine there are other ways to make /proc/self/fd/N not
> access-able, and you are, in that case, re-introducing the TOCTTOU -
> the attacker could try to quickliy insert the symlink after our
> checks and before the real mount().
Leaving lxc.rootfs unset results in lxc-start blocking right after 
lxc_mount_auto_mounts. However, setting it to "/" leads to a successful 
scenario, since lxc_mount_auto_mounts will mount a temporary /proc in 
/usr/var/lib/lxc/rootfs, so mount_proc_if_needed will no longer lead to 
mounting 
/proc again.

So my thought right now is:
- leave the open_without_symlink call in order to check that the destination is 
not a symlink
- use the full destination path in mount syscall only if the fstype is proc and 
the destination path is "/proc" - otherwise use destfd

Does this make sense?
>> +    ret = mount(mntsrc, mntdest, fstype, flags, data);
>>      if (ret < 0) {
>> -            errno = saved_errno;
>>              SYSERROR("Failed to mount %s onto %s", src, dest);
>> -            return ret;
>> +            goto out_dest;
>>      }
>>   
>> -    return 0;
>> +    ret = 0;
>> +
>> +out_dest:
>> +    if (destfd > 0)
> These should be >= 0 (here and below) right?
>
True, 0 _is_ a valid file descriptor. Will fix in v3.
>> +            close(destfd);
>> +out_src:
>> +    if (srcfd > 0)
>> +            close(srcfd);
>> +out:
>> +    return ret;
>>   }
Thanks for the feedback!
Bogdan P.
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

Reply via email to