Re: [lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-14 Thread Bogdan Purcareata
On 14.01.2016 01:09, Serge Hallyn wrote:
> Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
>> On 11.01.2016 20:59, Serge Hallyn wrote:
>>> Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
 The safe_mount primitive will mount the fs in the new container
 environment by using file descriptors referred in /proc/self/fd.
 However, when the mounted filesystem is proc itself, it will have
 been previously unmounted, therefore resulting in an error when
 searching for these file descriptors. This only happens when there's
 no container rootfs prefix (commonly with lxc-execute).

 Implement the support for this use case as well, by doing the mount
 based on the full path.

 Refactor the whole function in order to remove duplicated code checks
 and improve readability.

 Changes since v1:
 - In order to address CVE-2015-1335, still check if the destination is
 not a symlink. Do the mount only if the destination file descriptor
 exists.

 Signed-off-by: Bogdan Purcareata 
 ---
src/lxc/utils.c | 49 -
1 file changed, 28 insertions(+), 21 deletions(-)

 diff --git a/src/lxc/utils.c b/src/lxc/utils.c
 index d9e769d..c53711a 100644
 --- a/src/lxc/utils.c
 +++ b/src/lxc/utils.c
 @@ -1644,9 +1644,9 @@ out:
int safe_mount(const char *src, const char *dest, const char *fstype,
unsigned long flags, const void *data, const char 
 *rootfs)
{
 -  int srcfd = -1, destfd, ret, saved_errno;
 +  int srcfd = -1, destfd = -1, ret = 0;
char srcbuf[50], destbuf[50]; // only needs enough for 
 /proc/self/fd/
 -  const char *mntsrc = src;
 +  const char *mntsrc = src, *mntdest = dest;

if (!rootfs)
rootfs = "";
 @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, 
 const char *fstype,
if (flags & MS_BIND && src && src[0] != '/') {
INFO("this is a relative bind mount");
srcfd = open_without_symlink(src, NULL);
 -  if (srcfd < 0)
 -  return srcfd;
 +  if (srcfd < 0) {
 +  ret = srcfd;
 +  goto out;
 +  }
ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
if (ret < 0 || ret > 50) {
 -  close(srcfd);
ERROR("Out of memory");
 -  return -EINVAL;
 +  ret = -EINVAL;
 +  goto out_src;
}
mntsrc = srcbuf;
}

destfd = open_without_symlink(dest, rootfs);
if (destfd < 0) {
 -  if (srcfd != -1)
 -  close(srcfd);
 -  return destfd;
 +  ret = destfd;
 +  goto out_src;
}

ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
if (ret < 0 || ret > 50) {
 -  if (srcfd != -1)
 -  close(srcfd);
 -  close(destfd);
ERROR("Out of memory");
 -  return -EINVAL;
 +  ret = -EINVAL;
 +  goto out_dest;
}

 -  ret = mount(mntsrc, destbuf, fstype, flags, data);
 -  saved_errno = errno;
 -  if (srcfd != -1)
 -  close(srcfd);
 -  close(destfd);
 +  /* make sure the destination descriptor exists */
 +  if (access(destbuf, F_OK) == 0)
 +  mntdest = destbuf;
>>> First, if we're going to shortcut I'd prefer to say "if /proc/self
>>> does not exist then skip this check" fo rnow.
>>> But can we think of any way to still do this check?
>>>
>>> What exactly are the cases?
>>>
>>> 1. lxc-execute, lxc-init tries to mount /proc.  We should be able
>>> to simply have lxc always mount /proc before the pivot_root, so
>>> we can properly do this check.
>>>
>>> what use-cases will break if we demand /proc to exist in the
>>> container?  (We can add an option to umount /proc in lxc-init,
>>> but the directory would have to exist)
>> That's my use case - the failing function is tmp_proc_mount, thus happening
>
> Wait, is your rootfs NULL or not?

Yep, it's NULL - plain "lxc-execute -n foo -f empty.conf -- bash".

> We can handle that case specially.
>
> mount_proc_if_needed() is only called from tmp_proc_mount(), which is only
> called from lxc_setup().  If rootfs is NULL, then all bets are off anyway
> and we can just call mount().  If rootfs is not NULL, then we're mounting
> in a separate container rootfs and we should use safe_mount(), but we
> expect the real /proc to then exist.

In that case, would the following fix be more 

Re: [lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-14 Thread Serge Hallyn
Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
> On 14.01.2016 01:09, Serge Hallyn wrote:
> > Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
> >> On 11.01.2016 20:59, Serge Hallyn wrote:
> >>> Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
>  The safe_mount primitive will mount the fs in the new container
>  environment by using file descriptors referred in /proc/self/fd.
>  However, when the mounted filesystem is proc itself, it will have
>  been previously unmounted, therefore resulting in an error when
>  searching for these file descriptors. This only happens when there's
>  no container rootfs prefix (commonly with lxc-execute).
> 
>  Implement the support for this use case as well, by doing the mount
>  based on the full path.
> 
>  Refactor the whole function in order to remove duplicated code checks
>  and improve readability.
> 
>  Changes since v1:
>  - In order to address CVE-2015-1335, still check if the destination is
>  not a symlink. Do the mount only if the destination file descriptor
>  exists.
> 
>  Signed-off-by: Bogdan Purcareata 
>  ---
> src/lxc/utils.c | 49 -
> 1 file changed, 28 insertions(+), 21 deletions(-)
> 
>  diff --git a/src/lxc/utils.c b/src/lxc/utils.c
>  index d9e769d..c53711a 100644
>  --- a/src/lxc/utils.c
>  +++ b/src/lxc/utils.c
>  @@ -1644,9 +1644,9 @@ out:
> int safe_mount(const char *src, const char *dest, const char *fstype,
>   unsigned long flags, const void *data, const char 
>  *rootfs)
> {
>  -int srcfd = -1, destfd, ret, saved_errno;
>  +int srcfd = -1, destfd = -1, ret = 0;
>   char srcbuf[50], destbuf[50]; // only needs enough for 
>  /proc/self/fd/
>  -const char *mntsrc = src;
>  +const char *mntsrc = src, *mntdest = dest;
> 
>   if (!rootfs)
>   rootfs = "";
>  @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char 
>  *dest, const char *fstype,
>   if (flags & MS_BIND && src && src[0] != '/') {
>   INFO("this is a relative bind mount");
>   srcfd = open_without_symlink(src, NULL);
>  -if (srcfd < 0)
>  -return srcfd;
>  +if (srcfd < 0) {
>  +ret = srcfd;
>  +goto out;
>  +}
>   ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
>   if (ret < 0 || ret > 50) {
>  -close(srcfd);
>   ERROR("Out of memory");
>  -return -EINVAL;
>  +ret = -EINVAL;
>  +goto out_src;
>   }
>   mntsrc = srcbuf;
>   }
> 
>   destfd = open_without_symlink(dest, rootfs);
>   if (destfd < 0) {
>  -if (srcfd != -1)
>  -close(srcfd);
>  -return destfd;
>  +ret = destfd;
>  +goto out_src;
>   }
> 
>   ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
>   if (ret < 0 || ret > 50) {
>  -if (srcfd != -1)
>  -close(srcfd);
>  -close(destfd);
>   ERROR("Out of memory");
>  -return -EINVAL;
>  +ret = -EINVAL;
>  +goto out_dest;
>   }
> 
>  -ret = mount(mntsrc, destbuf, fstype, flags, data);
>  -saved_errno = errno;
>  -if (srcfd != -1)
>  -close(srcfd);
>  -close(destfd);
>  +/* make sure the destination descriptor exists */
>  +if (access(destbuf, F_OK) == 0)
>  +mntdest = destbuf;
> >>> First, if we're going to shortcut I'd prefer to say "if /proc/self
> >>> does not exist then skip this check" fo rnow.
> >>> But can we think of any way to still do this check?
> >>>
> >>> What exactly are the cases?
> >>>
> >>> 1. lxc-execute, lxc-init tries to mount /proc.  We should be able
> >>> to simply have lxc always mount /proc before the pivot_root, so
> >>> we can properly do this check.
> >>>
> >>> what use-cases will break if we demand /proc to exist in the
> >>> container?  (We can add an option to umount /proc in lxc-init,
> >>> but the directory would have to exist)
> >> That's my use case - the failing function is tmp_proc_mount, thus happening
> >
> > Wait, is your rootfs NULL or not?
> 
> Yep, it's NULL - plain "lxc-execute -n foo -f 

Re: [lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-13 Thread Bogdan Purcareata
On 11.01.2016 20:59, Serge Hallyn wrote:
> Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
>> The safe_mount primitive will mount the fs in the new container
>> environment by using file descriptors referred in /proc/self/fd.
>> However, when the mounted filesystem is proc itself, it will have
>> been previously unmounted, therefore resulting in an error when
>> searching for these file descriptors. This only happens when there's
>> no container rootfs prefix (commonly with lxc-execute).
>>
>> Implement the support for this use case as well, by doing the mount
>> based on the full path.
>>
>> Refactor the whole function in order to remove duplicated code checks
>> and improve readability.
>>
>> Changes since v1:
>> - In order to address CVE-2015-1335, still check if the destination is
>> not a symlink. Do the mount only if the destination file descriptor
>> exists.
>>
>> Signed-off-by: Bogdan Purcareata 
>> ---
>>   src/lxc/utils.c | 49 -
>>   1 file changed, 28 insertions(+), 21 deletions(-)
>>
>> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
>> index d9e769d..c53711a 100644
>> --- a/src/lxc/utils.c
>> +++ b/src/lxc/utils.c
>> @@ -1644,9 +1644,9 @@ out:
>>   int safe_mount(const char *src, const char *dest, const char *fstype,
>>  unsigned long flags, const void *data, const char *rootfs)
>>   {
>> -int srcfd = -1, destfd, ret, saved_errno;
>> +int srcfd = -1, destfd = -1, ret = 0;
>>  char srcbuf[50], destbuf[50]; // only needs enough for 
>> /proc/self/fd/
>> -const char *mntsrc = src;
>> +const char *mntsrc = src, *mntdest = dest;
>>   
>>  if (!rootfs)
>>  rootfs = "";
>> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, 
>> const char *fstype,
>>  if (flags & MS_BIND && src && src[0] != '/') {
>>  INFO("this is a relative bind mount");
>>  srcfd = open_without_symlink(src, NULL);
>> -if (srcfd < 0)
>> -return srcfd;
>> +if (srcfd < 0) {
>> +ret = srcfd;
>> +goto out;
>> +}
>>  ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
>>  if (ret < 0 || ret > 50) {
>> -close(srcfd);
>>  ERROR("Out of memory");
>> -return -EINVAL;
>> +ret = -EINVAL;
>> +goto out_src;
>>  }
>>  mntsrc = srcbuf;
>>  }
>>   
>>  destfd = open_without_symlink(dest, rootfs);
>>  if (destfd < 0) {
>> -if (srcfd != -1)
>> -close(srcfd);
>> -return destfd;
>> +ret = destfd;
>> +goto out_src;
>>  }
>>   
>>  ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
>>  if (ret < 0 || ret > 50) {
>> -if (srcfd != -1)
>> -close(srcfd);
>> -close(destfd);
>>  ERROR("Out of memory");
>> -return -EINVAL;
>> +ret = -EINVAL;
>> +goto out_dest;
>>  }
>>   
>> -ret = mount(mntsrc, destbuf, fstype, flags, data);
>> -saved_errno = errno;
>> -if (srcfd != -1)
>> -close(srcfd);
>> -close(destfd);
>> +/* make sure the destination descriptor exists */
>> +if (access(destbuf, F_OK) == 0)
>> +mntdest = destbuf;
> First, if we're going to shortcut I'd prefer to say "if /proc/self
> does not exist then skip this check" fo rnow.
> But can we think of any way to still do this check?
>
> What exactly are the cases?
>
> 1. lxc-execute, lxc-init tries to mount /proc.  We should be able
> to simply have lxc always mount /proc before the pivot_root, so
> we can properly do this check.
>
> what use-cases will break if we demand /proc to exist in the
> container?  (We can add an option to umount /proc in lxc-init,
> but the directory would have to exist)
That's my use case - the failing function is tmp_proc_mount, thus happening 
before pivot_root, and the scenario is running an application container with 
lxc-execute. This calls mount_proc_if_needed which is bound to fail, since it 
will first unmount /proc (because after cloning the process in the new 
namespaces, its PID will be different than /proc/self), and then try to 
safe_mount it, depending on /proc/self/fd. However, tmp_proc_mount will not 
fail, and afterwards lxc-init will mount /proc following the behavior you 
mentioned. So functionally all is ok, but tmp_proc_mount -> 
mount_proc_if_needed 
-> safe_mount will still spit a nasty warning.

BTW this whole thing is with LXC 1.1.5.
>
> 2. lxc.rootfs unset.  In this case we're trusting the *host* admin
> to not have messed with /proc to make it a symlink, if we can't
> trust that we're out of luck.  Other paths are not the same (since
> parts of the rootfs could be bind-mounted from container-owned

Re: [lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-13 Thread Serge Hallyn
Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
> On 11.01.2016 20:59, Serge Hallyn wrote:
> > Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
> >> The safe_mount primitive will mount the fs in the new container
> >> environment by using file descriptors referred in /proc/self/fd.
> >> However, when the mounted filesystem is proc itself, it will have
> >> been previously unmounted, therefore resulting in an error when
> >> searching for these file descriptors. This only happens when there's
> >> no container rootfs prefix (commonly with lxc-execute).
> >>
> >> Implement the support for this use case as well, by doing the mount
> >> based on the full path.
> >>
> >> Refactor the whole function in order to remove duplicated code checks
> >> and improve readability.
> >>
> >> Changes since v1:
> >> - In order to address CVE-2015-1335, still check if the destination is
> >> not a symlink. Do the mount only if the destination file descriptor
> >> exists.
> >>
> >> Signed-off-by: Bogdan Purcareata 
> >> ---
> >>   src/lxc/utils.c | 49 -
> >>   1 file changed, 28 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
> >> index d9e769d..c53711a 100644
> >> --- a/src/lxc/utils.c
> >> +++ b/src/lxc/utils.c
> >> @@ -1644,9 +1644,9 @@ out:
> >>   int safe_mount(const char *src, const char *dest, const char *fstype,
> >>unsigned long flags, const void *data, const char *rootfs)
> >>   {
> >> -  int srcfd = -1, destfd, ret, saved_errno;
> >> +  int srcfd = -1, destfd = -1, ret = 0;
> >>char srcbuf[50], destbuf[50]; // only needs enough for 
> >> /proc/self/fd/
> >> -  const char *mntsrc = src;
> >> +  const char *mntsrc = src, *mntdest = dest;
> >>   
> >>if (!rootfs)
> >>rootfs = "";
> >> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, 
> >> const char *fstype,
> >>if (flags & MS_BIND && src && src[0] != '/') {
> >>INFO("this is a relative bind mount");
> >>srcfd = open_without_symlink(src, NULL);
> >> -  if (srcfd < 0)
> >> -  return srcfd;
> >> +  if (srcfd < 0) {
> >> +  ret = srcfd;
> >> +  goto out;
> >> +  }
> >>ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
> >>if (ret < 0 || ret > 50) {
> >> -  close(srcfd);
> >>ERROR("Out of memory");
> >> -  return -EINVAL;
> >> +  ret = -EINVAL;
> >> +  goto out_src;
> >>}
> >>mntsrc = srcbuf;
> >>}
> >>   
> >>destfd = open_without_symlink(dest, rootfs);
> >>if (destfd < 0) {
> >> -  if (srcfd != -1)
> >> -  close(srcfd);
> >> -  return destfd;
> >> +  ret = destfd;
> >> +  goto out_src;
> >>}
> >>   
> >>ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
> >>if (ret < 0 || ret > 50) {
> >> -  if (srcfd != -1)
> >> -  close(srcfd);
> >> -  close(destfd);
> >>ERROR("Out of memory");
> >> -  return -EINVAL;
> >> +  ret = -EINVAL;
> >> +  goto out_dest;
> >>}
> >>   
> >> -  ret = mount(mntsrc, destbuf, fstype, flags, data);
> >> -  saved_errno = errno;
> >> -  if (srcfd != -1)
> >> -  close(srcfd);
> >> -  close(destfd);
> >> +  /* make sure the destination descriptor exists */
> >> +  if (access(destbuf, F_OK) == 0)
> >> +  mntdest = destbuf;
> > First, if we're going to shortcut I'd prefer to say "if /proc/self
> > does not exist then skip this check" fo rnow.
> > But can we think of any way to still do this check?
> >
> > What exactly are the cases?
> >
> > 1. lxc-execute, lxc-init tries to mount /proc.  We should be able
> > to simply have lxc always mount /proc before the pivot_root, so
> > we can properly do this check.
> >
> > what use-cases will break if we demand /proc to exist in the
> > container?  (We can add an option to umount /proc in lxc-init,
> > but the directory would have to exist)
> That's my use case - the failing function is tmp_proc_mount, thus happening 

Wait, is your rootfs NULL or not?

We can handle that case specially.

mount_proc_if_needed() is only called from tmp_proc_mount(), which is only
called from lxc_setup().  If rootfs is NULL, then all bets are off anyway
and we can just call mount().  If rootfs is not NULL, then we're mounting
in a separate container rootfs and we should use safe_mount(), but we
expect the real /proc to then exist.
___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


[lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-11 Thread Bogdan Purcareata
The safe_mount primitive will mount the fs in the new container
environment by using file descriptors referred in /proc/self/fd.
However, when the mounted filesystem is proc itself, it will have
been previously unmounted, therefore resulting in an error when
searching for these file descriptors. This only happens when there's
no container rootfs prefix (commonly with lxc-execute).

Implement the support for this use case as well, by doing the mount
based on the full path.

Refactor the whole function in order to remove duplicated code checks
and improve readability.

Changes since v1:
- In order to address CVE-2015-1335, still check if the destination is
not a symlink. Do the mount only if the destination file descriptor
exists.

Signed-off-by: Bogdan Purcareata 
---
 src/lxc/utils.c | 49 -
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/src/lxc/utils.c b/src/lxc/utils.c
index d9e769d..c53711a 100644
--- a/src/lxc/utils.c
+++ b/src/lxc/utils.c
@@ -1644,9 +1644,9 @@ out:
 int safe_mount(const char *src, const char *dest, const char *fstype,
unsigned long flags, const void *data, const char *rootfs)
 {
-   int srcfd = -1, destfd, ret, saved_errno;
+   int srcfd = -1, destfd = -1, ret = 0;
char srcbuf[50], destbuf[50]; // only needs enough for 
/proc/self/fd/
-   const char *mntsrc = src;
+   const char *mntsrc = src, *mntdest = dest;
 
if (!rootfs)
rootfs = "";
@@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, const 
char *fstype,
if (flags & MS_BIND && src && src[0] != '/') {
INFO("this is a relative bind mount");
srcfd = open_without_symlink(src, NULL);
-   if (srcfd < 0)
-   return srcfd;
+   if (srcfd < 0) {
+   ret = srcfd;
+   goto out;
+   }
ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
if (ret < 0 || ret > 50) {
-   close(srcfd);
ERROR("Out of memory");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out_src;
}
mntsrc = srcbuf;
}
 
destfd = open_without_symlink(dest, rootfs);
if (destfd < 0) {
-   if (srcfd != -1)
-   close(srcfd);
-   return destfd;
+   ret = destfd;
+   goto out_src;
}
 
ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
if (ret < 0 || ret > 50) {
-   if (srcfd != -1)
-   close(srcfd);
-   close(destfd);
ERROR("Out of memory");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out_dest;
}
 
-   ret = mount(mntsrc, destbuf, fstype, flags, data);
-   saved_errno = errno;
-   if (srcfd != -1)
-   close(srcfd);
-   close(destfd);
+   /* make sure the destination descriptor exists */
+   if (access(destbuf, F_OK) == 0)
+   mntdest = destbuf;
+
+   ret = mount(mntsrc, mntdest, fstype, flags, data);
if (ret < 0) {
-   errno = saved_errno;
SYSERROR("Failed to mount %s onto %s", src, dest);
-   return ret;
+   goto out_dest;
}
 
-   return 0;
+   ret = 0;
+
+out_dest:
+   if (destfd > 0)
+   close(destfd);
+out_src:
+   if (srcfd > 0)
+   close(srcfd);
+out:
+   return ret;
 }
 
 /*
-- 
1.9.1

___
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel


Re: [lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor

2016-01-11 Thread Serge Hallyn
Quoting Bogdan Purcareata (bogdan.purcare...@nxp.com):
> The safe_mount primitive will mount the fs in the new container
> environment by using file descriptors referred in /proc/self/fd.
> However, when the mounted filesystem is proc itself, it will have
> been previously unmounted, therefore resulting in an error when
> searching for these file descriptors. This only happens when there's
> no container rootfs prefix (commonly with lxc-execute).
> 
> Implement the support for this use case as well, by doing the mount
> based on the full path.
> 
> Refactor the whole function in order to remove duplicated code checks
> and improve readability.
> 
> Changes since v1:
> - In order to address CVE-2015-1335, still check if the destination is
> not a symlink. Do the mount only if the destination file descriptor
> exists.
> 
> Signed-off-by: Bogdan Purcareata 
> ---
>  src/lxc/utils.c | 49 -
>  1 file changed, 28 insertions(+), 21 deletions(-)
> 
> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
> index d9e769d..c53711a 100644
> --- a/src/lxc/utils.c
> +++ b/src/lxc/utils.c
> @@ -1644,9 +1644,9 @@ out:
>  int safe_mount(const char *src, const char *dest, const char *fstype,
>   unsigned long flags, const void *data, const char *rootfs)
>  {
> - int srcfd = -1, destfd, ret, saved_errno;
> + int srcfd = -1, destfd = -1, ret = 0;
>   char srcbuf[50], destbuf[50]; // only needs enough for 
> /proc/self/fd/
> - const char *mntsrc = src;
> + const char *mntsrc = src, *mntdest = dest;
>  
>   if (!rootfs)
>   rootfs = "";
> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, 
> const char *fstype,
>   if (flags & MS_BIND && src && src[0] != '/') {
>   INFO("this is a relative bind mount");
>   srcfd = open_without_symlink(src, NULL);
> - if (srcfd < 0)
> - return srcfd;
> + if (srcfd < 0) {
> + ret = srcfd;
> + goto out;
> + }
>   ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
>   if (ret < 0 || ret > 50) {
> - close(srcfd);
>   ERROR("Out of memory");
> - return -EINVAL;
> + ret = -EINVAL;
> + goto out_src;
>   }
>   mntsrc = srcbuf;
>   }
>  
>   destfd = open_without_symlink(dest, rootfs);
>   if (destfd < 0) {
> - if (srcfd != -1)
> - close(srcfd);
> - return destfd;
> + ret = destfd;
> + goto out_src;
>   }
>  
>   ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
>   if (ret < 0 || ret > 50) {
> - if (srcfd != -1)
> - close(srcfd);
> - close(destfd);
>   ERROR("Out of memory");
> - return -EINVAL;
> + ret = -EINVAL;
> + goto out_dest;
>   }
>  
> - ret = mount(mntsrc, destbuf, fstype, flags, data);
> - saved_errno = errno;
> - if (srcfd != -1)
> - close(srcfd);
> - close(destfd);
> + /* make sure the destination descriptor exists */
> + if (access(destbuf, F_OK) == 0)
> + mntdest = destbuf;

First, if we're going to shortcut I'd prefer to say "if /proc/self
does not exist then skip this check" fo rnow.

But can we think of any way to still do this check?

What exactly are the cases?

1. lxc-execute, lxc-init tries to mount /proc.  We should be able
to simply have lxc always mount /proc before the pivot_root, so
we can properly do this check.

what use-cases will break if we demand /proc to exist in the
container?  (We can add an option to umount /proc in lxc-init,
but the directory would have to exist)

2. lxc.rootfs unset.  In this case we're trusting the *host* admin
to not have messed with /proc to make it a symlink, if we can't
trust that we're out of luck.  Other paths are not the same (since
parts of the rootfs could be bind-mounted from container-owned
dirs) so we should check those.  But so for this check we should
simply explicitly check for "/proc".  Doing a more roundabout
check may leave us open to subtle different attacks.  In particular
I imagine there are other ways to make /proc/self/fd/N not
access-able, and you are, in that case, re-introducing the TOCTTOU -
the attacker could try to quickliy insert the symlink after our
checks and before the real mount().

> + ret = mount(mntsrc, mntdest, fstype, flags, data);
>   if (ret < 0) {
> - errno = saved_errno;
>   SYSERROR("Failed to mount %s onto %s", src, dest);
> - return ret;
> + goto out_dest;
>   }
>  
> - return 0;
> + ret = 0;
> +
> +out_dest:
> + if (destfd > 0)

These should be >= 0 (here and below) right?

> +