Re: [GIT PULL] namespace updates for v3.17-rc1

2014-11-29 Thread Richard Weinberger
Am 26.11.2014 um 00:15 schrieb Richard Weinberger:
> Eric,
> 
> On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
>  wrote:
>> Richard Weinberger  writes:
>>
>>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
> Richard Weinberger  writes:
>
>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman 
>>  wrote:
>>
>> This commit breaks libvirt-lxc.
>> libvirt does in lxcContainerMountBasicFS():
>
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.

 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.
>>>
>>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>>> unbreaks libvirt-lxc.
>>> I hope it hits Linus tree and -stable before the offending commit hits 
>>> users.
>>
>> I plan to send the pull request to Linus as soon as I have caught my
>> breath (from all of the conferences this week) that I can be certain I
>> am thinking clearly and not rushing things.
> 
> Today I've upgraded my LXC testbed to the most recent kernel and found
> libvirt-lxc broken again (sic!).
> Remounting /proc/sys/ is failing.
> Investigating into the issue showed that commit "mnt: Implicitly add
> MNT_NODEV on remount as we do on mount"
> is not mainline.
> Why did you left out this patch? In my previous mails I explicitly
> stated that exactly this commit unbreaks libvirt-lxc.
> 
> Now the userspace breaking changes are mainline and hit users hard. :-(

*kind ping*
...to make sure that this issue doesn't get lost.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-11-29 Thread Richard Weinberger
Am 26.11.2014 um 00:15 schrieb Richard Weinberger:
 Eric,
 
 On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
 ebied...@xmission.com wrote:
 Richard Weinberger rich...@nod.at writes:

 Am 21.08.2014 15:12, schrieb Christoph Hellwig:
 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:

 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman 
 ebied...@xmission.com wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():

 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.

 It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
 git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
 unbreaks libvirt-lxc.
 I hope it hits Linus tree and -stable before the offending commit hits 
 users.

 I plan to send the pull request to Linus as soon as I have caught my
 breath (from all of the conferences this week) that I can be certain I
 am thinking clearly and not rushing things.
 
 Today I've upgraded my LXC testbed to the most recent kernel and found
 libvirt-lxc broken again (sic!).
 Remounting /proc/sys/ is failing.
 Investigating into the issue showed that commit mnt: Implicitly add
 MNT_NODEV on remount as we do on mount
 is not mainline.
 Why did you left out this patch? In my previous mails I explicitly
 stated that exactly this commit unbreaks libvirt-lxc.
 
 Now the userspace breaking changes are mainline and hit users hard. :-(

*kind ping*
...to make sure that this issue doesn't get lost.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-11-25 Thread Richard Weinberger
Eric,

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
 wrote:
> Richard Weinberger  writes:
>
>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger  writes:

> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
> wrote:
>
> This commit breaks libvirt-lxc.
> libvirt does in lxcContainerMountBasicFS():

 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.
>>>
>>> Can you explain the security issues in detail?  Breaking common
>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>> entirely unacceptable.
>>
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

Today I've upgraded my LXC testbed to the most recent kernel and found
libvirt-lxc broken again (sic!).
Remounting /proc/sys/ is failing.
Investigating into the issue showed that commit "mnt: Implicitly add
MNT_NODEV on remount as we do on mount"
is not mainline.
Why did you left out this patch? In my previous mails I explicitly
stated that exactly this commit unbreaks libvirt-lxc.

Now the userspace breaking changes are mainline and hit users hard. :-(

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-11-25 Thread Richard Weinberger
Eric,

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Richard Weinberger rich...@nod.at writes:

 Am 21.08.2014 15:12, schrieb Christoph Hellwig:
 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:

 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
 wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():

 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.

 It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
 git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
 unbreaks libvirt-lxc.
 I hope it hits Linus tree and -stable before the offending commit hits users.

 I plan to send the pull request to Linus as soon as I have caught my
 breath (from all of the conferences this week) that I can be certain I
 am thinking clearly and not rushing things.

Today I've upgraded my LXC testbed to the most recent kernel and found
libvirt-lxc broken again (sic!).
Remounting /proc/sys/ is failing.
Investigating into the issue showed that commit mnt: Implicitly add
MNT_NODEV on remount as we do on mount
is not mainline.
Why did you left out this patch? In my previous mails I explicitly
stated that exactly this commit unbreaks libvirt-lxc.

Now the userspace breaking changes are mainline and hit users hard. :-(

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-09-03 Thread Richard Weinberger
On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
 wrote:
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

*kind reminder* :-)

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-09-03 Thread Richard Weinberger
On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
 git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
 unbreaks libvirt-lxc.
 I hope it hits Linus tree and -stable before the offending commit hits users.

 I plan to send the pull request to Linus as soon as I have caught my
 breath (from all of the conferences this week) that I can be certain I
 am thinking clearly and not rushing things.

*kind reminder* :-)

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Richard Weinberger  writes:

> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>> Richard Weinberger  writes:
>>>
 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
 wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():
>>>
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>> 
>> Can you explain the security issues in detail?  Breaking common
>> userspace like libvirt-lxc with just a little bit of handwaiving is
>> entirely unacceptable.
>
> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
> unbreaks libvirt-lxc.
> I hope it hits Linus tree and -stable before the offending commit hits users.

I plan to send the pull request to Linus as soon as I have caught my
breath (from all of the conferences this week) that I can be certain I
am thinking clearly and not rushing things.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Richard Weinberger  writes:

> Am 21.08.2014 08:29, schrieb Richard Weinberger:
>> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>>>
>>> Could you please look at my user-namespace.git#for-next branch I have a
>>> fix for at least one regresion causing issue in there.  I think it may
>>> fix your issues but I am not fully certain more comments below.
>> 
>> I'll run this on my LXC testbed today.
>
> Looks good. With these patches applied libvirt works again. :)

Darn I read my email in the wrong order.  I am glad to hear that my
changes were enough to fix libvirt-lxc.

I will aim at pushing this to Linus after the conference is over and I
can trust myself to think clearly.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Christoph Hellwig  writes:

> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger  writes:
>> 
>> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
>> > wrote:
>> >
>> > This commit breaks libvirt-lxc.
>> > libvirt does in lxcContainerMountBasicFS():
>> 
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

The biggies ability for an unprivileged user to clear nosuid, nodev,
read-only, noexec with remount.

Apparently part of what libvirt-lxc clearing all mount flags except for
read-only on some filesystem it remounts read-only.  Which if root
mounted the filesystem with nosuid, nodev, or noexec is not supportable.

So if it isn't the implicit setting of nodev on mount but not on remount
causing problems (which I have the trivial fix for) there is a strong
chance I need to break libvirt-lxc.

I am several bugs deep already where fixing one bug reveals yet another
bug.  It is possible there is something else that that I have not
discovered that I am missing.  I am more to happy to investigate to see
if it possible to avoid breaking libvirt-lxc.

Apologies for being a little vague being at the conference I only have
access to email about an hour a day.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 15:12, schrieb Christoph Hellwig:
> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger  writes:
>>
>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
>>> wrote:
>>>
>>> This commit breaks libvirt-lxc.
>>> libvirt does in lxcContainerMountBasicFS():
>>
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
> 
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
unbreaks libvirt-lxc.
I hope it hits Linus tree and -stable before the offending commit hits users.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Christoph Hellwig
On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
> Richard Weinberger  writes:
> 
> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
> > wrote:
> >
> > This commit breaks libvirt-lxc.
> > libvirt does in lxcContainerMountBasicFS():
> 
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.

Can you explain the security issues in detail?  Breaking common
userspace like libvirt-lxc with just a little bit of handwaiving is
entirely unacceptable.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 08:29, schrieb Richard Weinberger:
> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>>
>> Could you please look at my user-namespace.git#for-next branch I have a
>> fix for at least one regresion causing issue in there.  I think it may
>> fix your issues but I am not fully certain more comments below.
> 
> I'll run this on my LXC testbed today.

Looks good. With these patches applied libvirt works again. :)

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 06:53, schrieb Eric W. Biederman:
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.
> 
> Could you please look at my user-namespace.git#for-next branch I have a
> fix for at least one regresion causing issue in there.  I think it may
> fix your issues but I am not fully certain more comments below.

I'll run this on my LXC testbed today.

>> /*
>>  * We can't immediately set the MS_RDONLY flag when mounting 
>> filesystems
>>  * because (in at least some kernel versions) this will propagate 
>> back
>>  * to the original mount in the host OS, turning it readonly too. 
>> Thus
>>  * we mount the filesystem in read-write mode initially, and then do 
>> a
>>  * separate read-only bind mount on top of that.
>>  */
>> bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>>
>> VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>>   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
>> ~MS_RDONLY, NULL) < 0) {
>>
>>  Here it fails for sysfs because with user namespaces we bind the
>> existing /sys into the container
>> and would have to read out all existing mount flags from the current /sys 
>> mount.
>> Otherwise mount() fails with EPERM.
>> On my test system /sys is mounted with
>> "rw,nosuid,nodev,noexec,relatime" and libvirt
>> misses the realtime...
> 
> Not specifying any atime flags to mount should be safe as that asks for
> the default atime flags which for remount I have made the default atime
> flags the existing atime flags.  So I am scratching my head a little on
> this one.

Okay, let me find out why exactly libvirt gets a EPERM here.
Maybe there are more odds hidden.

>>
>> virReportSystemError(errno,
>>  _("Failed to mount %s on %s type %s 
>> flags=%x"),
>>  mnt_src, mnt->dst, NULLSTR(mnt->type),
>>  mnt_mflags & ~MS_RDONLY);
>> goto cleanup;
>> }
>>
>> if (bindOverReadonly &&
>> mount(mnt_src, mnt->dst, NULL,
>>   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>>
>> ^^^ Here it fails because now we'd have to specify all flags as used
>> for the first
>> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
>> See lxcBasicMounts[].
>> In this case the fix is easy, add mnt_mflags to the mount flags.
> 
> That has always been a bug in general because remount has always
> required specifying the complete set of mount flags you want to have.
> 
> That fact that flags such as nosuid are now properly locked so you can
> not change them if you are not the global root user just makes this
> obvious.
> 
> Andy Lutermorski has observed that statvfs will return the mount flags
> making reading them simple.

Thanks for the clarification, I'll create a fix for libvirt.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 06:53, schrieb Eric W. Biederman:
 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.
 
 Could you please look at my user-namespace.git#for-next branch I have a
 fix for at least one regresion causing issue in there.  I think it may
 fix your issues but I am not fully certain more comments below.

I'll run this on my LXC testbed today.

 /*
  * We can't immediately set the MS_RDONLY flag when mounting 
 filesystems
  * because (in at least some kernel versions) this will propagate 
 back
  * to the original mount in the host OS, turning it readonly too. 
 Thus
  * we mount the filesystem in read-write mode initially, and then do 
 a
  * separate read-only bind mount on top of that.
  */
 bindOverReadonly = !!(mnt_mflags  MS_RDONLY);

 VIR_DEBUG(Mount %s on %s type=%s flags=%x,
   mnt_src, mnt-dst, mnt-type, mnt_mflags  ~MS_RDONLY);
 if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags 
 ~MS_RDONLY, NULL)  0) {

  Here it fails for sysfs because with user namespaces we bind the
 existing /sys into the container
 and would have to read out all existing mount flags from the current /sys 
 mount.
 Otherwise mount() fails with EPERM.
 On my test system /sys is mounted with
 rw,nosuid,nodev,noexec,relatime and libvirt
 misses the realtime...
 
 Not specifying any atime flags to mount should be safe as that asks for
 the default atime flags which for remount I have made the default atime
 flags the existing atime flags.  So I am scratching my head a little on
 this one.

Okay, let me find out why exactly libvirt gets a EPERM here.
Maybe there are more odds hidden.


 virReportSystemError(errno,
  _(Failed to mount %s on %s type %s 
 flags=%x),
  mnt_src, mnt-dst, NULLSTR(mnt-type),
  mnt_mflags  ~MS_RDONLY);
 goto cleanup;
 }

 if (bindOverReadonly 
 mount(mnt_src, mnt-dst, NULL,
   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL)  0) {

 ^^^ Here it fails because now we'd have to specify all flags as used
 for the first
 mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
 See lxcBasicMounts[].
 In this case the fix is easy, add mnt_mflags to the mount flags.
 
 That has always been a bug in general because remount has always
 required specifying the complete set of mount flags you want to have.
 
 That fact that flags such as nosuid are now properly locked so you can
 not change them if you are not the global root user just makes this
 obvious.
 
 Andy Lutermorski has observed that statvfs will return the mount flags
 making reading them simple.

Thanks for the clarification, I'll create a fix for libvirt.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 08:29, schrieb Richard Weinberger:
 Am 21.08.2014 06:53, schrieb Eric W. Biederman:
 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

 Could you please look at my user-namespace.git#for-next branch I have a
 fix for at least one regresion causing issue in there.  I think it may
 fix your issues but I am not fully certain more comments below.
 
 I'll run this on my LXC testbed today.

Looks good. With these patches applied libvirt works again. :)

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Christoph Hellwig
On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:
 
  On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
  wrote:
 
  This commit breaks libvirt-lxc.
  libvirt does in lxcContainerMountBasicFS():
 
 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

Can you explain the security issues in detail?  Breaking common
userspace like libvirt-lxc with just a little bit of handwaiving is
entirely unacceptable.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Richard Weinberger
Am 21.08.2014 15:12, schrieb Christoph Hellwig:
 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:

 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
 wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():

 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.
 
 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.

It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
unbreaks libvirt-lxc.
I hope it hits Linus tree and -stable before the offending commit hits users.

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Christoph Hellwig h...@infradead.org writes:

 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:
 
  On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
  wrote:
 
  This commit breaks libvirt-lxc.
  libvirt does in lxcContainerMountBasicFS():
 
 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.

The biggies ability for an unprivileged user to clear nosuid, nodev,
read-only, noexec with remount.

Apparently part of what libvirt-lxc clearing all mount flags except for
read-only on some filesystem it remounts read-only.  Which if root
mounted the filesystem with nosuid, nodev, or noexec is not supportable.

So if it isn't the implicit setting of nodev on mount but not on remount
causing problems (which I have the trivial fix for) there is a strong
chance I need to break libvirt-lxc.

I am several bugs deep already where fixing one bug reveals yet another
bug.  It is possible there is something else that that I have not
discovered that I am missing.  I am more to happy to investigate to see
if it possible to avoid breaking libvirt-lxc.

Apologies for being a little vague being at the conference I only have
access to email about an hour a day.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Richard Weinberger rich...@nod.at writes:

 Am 21.08.2014 08:29, schrieb Richard Weinberger:
 Am 21.08.2014 06:53, schrieb Eric W. Biederman:
 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.

 Could you please look at my user-namespace.git#for-next branch I have a
 fix for at least one regresion causing issue in there.  I think it may
 fix your issues but I am not fully certain more comments below.
 
 I'll run this on my LXC testbed today.

 Looks good. With these patches applied libvirt works again. :)

Darn I read my email in the wrong order.  I am glad to hear that my
changes were enough to fix libvirt-lxc.

I will aim at pushing this to Linus after the conference is over and I
can trust myself to think clearly.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-21 Thread Eric W. Biederman
Richard Weinberger rich...@nod.at writes:

 Am 21.08.2014 15:12, schrieb Christoph Hellwig:
 On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
 Richard Weinberger richard.weinber...@gmail.com writes:

 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
 wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():

 The bugs fixed are security issues, so if we have to break a small
 number of userspace applications we will.  Anything that we can
 reasonably do to avoid regressions will be done.
 
 Can you explain the security issues in detail?  Breaking common
 userspace like libvirt-lxc with just a little bit of handwaiving is
 entirely unacceptable.

 It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
 git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
 unbreaks libvirt-lxc.
 I hope it hits Linus tree and -stable before the offending commit hits users.

I plan to send the pull request to Linus as soon as I have caught my
breath (from all of the conferences this week) that I can be certain I
am thinking clearly and not rushing things.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Eric W. Biederman
Richard Weinberger  writes:

> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  
> wrote:
>
> This commit breaks libvirt-lxc.
> libvirt does in lxcContainerMountBasicFS():

The bugs fixed are security issues, so if we have to break a small
number of userspace applications we will.  Anything that we can
reasonably do to avoid regressions will be done.

Could you please look at my user-namespace.git#for-next branch I have a
fix for at least one regresion causing issue in there.  I think it may
fix your issues but I am not fully certain more comments below.

> /*
>  * We can't immediately set the MS_RDONLY flag when mounting 
> filesystems
>  * because (in at least some kernel versions) this will propagate back
>  * to the original mount in the host OS, turning it readonly too. Thus
>  * we mount the filesystem in read-write mode initially, and then do a
>  * separate read-only bind mount on top of that.
>  */
> bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>
> VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
> ~MS_RDONLY, NULL) < 0) {
>
>  Here it fails for sysfs because with user namespaces we bind the
> existing /sys into the container
> and would have to read out all existing mount flags from the current /sys 
> mount.
> Otherwise mount() fails with EPERM.
> On my test system /sys is mounted with
> "rw,nosuid,nodev,noexec,relatime" and libvirt
> misses the realtime...

Not specifying any atime flags to mount should be safe as that asks for
the default atime flags which for remount I have made the default atime
flags the existing atime flags.  So I am scratching my head a little on
this one.

>
> virReportSystemError(errno,
>  _("Failed to mount %s on %s type %s 
> flags=%x"),
>  mnt_src, mnt->dst, NULLSTR(mnt->type),
>  mnt_mflags & ~MS_RDONLY);
> goto cleanup;
> }
>
> if (bindOverReadonly &&
> mount(mnt_src, mnt->dst, NULL,
>   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>
> ^^^ Here it fails because now we'd have to specify all flags as used
> for the first
> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
> See lxcBasicMounts[].
> In this case the fix is easy, add mnt_mflags to the mount flags.

That has always been a bug in general because remount has always
required specifying the complete set of mount flags you want to have.

That fact that flags such as nosuid are now properly locked so you can
not change them if you are not the global root user just makes this
obvious.

Andy Lutermorski has observed that statvfs will return the mount flags
making reading them simple.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Richard Weinberger
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman  wrote:
>
> Linus,
>
> Please pull the for-linus branch from the git tree:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> for-linus
>
>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
> /proc/thread-self/mounts instead of /proc/self/mounts
>
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
>
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
>
> The next change fixes a bug in NFS that was discovered while auditing
> nsproxy users for the first optimization.  Today you can oops the kernel
> by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
> namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
> yesterday when a build bot caught my typo.  Given that no one to my
> knowledge bases anything on my tree fixing the typo in place seems more
> responsible that requiring a typo-fix to be backported as well.
>
> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.
>
> Eric
>
> Eric W. Biederman (11):
>   namespaces: Use task_lock and not rcu to protect nsproxy
>   mnt: Only change user settable mount flags in remount
>   mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into 
> do_remount
>   mnt: Correct permission checks in do_remount

This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

/*
 * We can't immediately set the MS_RDONLY flag when mounting filesystems
 * because (in at least some kernel versions) this will propagate back
 * to the original mount in the host OS, turning it readonly too. Thus
 * we mount the filesystem in read-write mode initially, and then do a
 * separate read-only bind mount on top of that.
 */
bindOverReadonly = !!(mnt_mflags & MS_RDONLY);

VIR_DEBUG("Mount %s on %s type=%s flags=%x",
  mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {

 Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...

virReportSystemError(errno,
 _("Failed to mount %s on %s type %s flags=%x"),
 mnt_src, mnt->dst, NULLSTR(mnt->type),
 mnt_mflags & ~MS_RDONLY);
goto cleanup;
}

if (bindOverReadonly &&
mount(mnt_src, mnt->dst, NULL,
  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

 virReportSystemError(errno,
 _("Failed to re-mount %s on %s flags=%x"),
 mnt_src, mnt->dst,
 MS_BIND|MS_REMOUNT|MS_RDONLY);
goto cleanup;
}


-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Richard Weinberger
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote:

 Linus,

 Please pull the for-linus branch from the git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
 for-linus

HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
 /proc/thread-self/mounts instead of /proc/self/mounts

 This is a bunch of small changes built against 3.16-rc6.  The most
 significant change for users is the first patch which makes setns
 drmatically faster by removing unneded rcu handling.

 The next chunk of changes are so that mount -o remount,.. will not
 allow the user namespace root to drop flags on a mount set by the system
 wide root.  Aks this forces read-only mounts to stay read-only, no-dev
 mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
 stay no exec and it prevents unprivileged users from messing with a
 mounts atime settings.  I have included my test case as the last patch
 in this series so people performing backports can verify this change
 works correctly.

 The next change fixes a bug in NFS that was discovered while auditing
 nsproxy users for the first optimization.  Today you can oops the kernel
 by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
 namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
 yesterday when a build bot caught my typo.  Given that no one to my
 knowledge bases anything on my tree fixing the typo in place seems more
 responsible that requiring a typo-fix to be backported as well.

 The last change is a small semantic cleanup introducing
 /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
 prevents several kinds of problemantic corner cases.  It is a
 user-visible change so it has a minute chance of causing regressions so
 the change to /proc/mounts and /proc/net are individual one line commits
 that can be trivially reverted.  Unfortunately I lost and could not find
 the email of the original reporter so he is not credited.  From at least
 one perspective this change to /proc/net is a refgression fix to allow
 pthread /proc/net uses that were broken by the introduction of the network
 namespace.

 Eric

 Eric W. Biederman (11):
   namespaces: Use task_lock and not rcu to protect nsproxy
   mnt: Only change user settable mount flags in remount
   mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into 
 do_remount
   mnt: Correct permission checks in do_remount

This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

/*
 * We can't immediately set the MS_RDONLY flag when mounting filesystems
 * because (in at least some kernel versions) this will propagate back
 * to the original mount in the host OS, turning it readonly too. Thus
 * we mount the filesystem in read-write mode initially, and then do a
 * separate read-only bind mount on top of that.
 */
bindOverReadonly = !!(mnt_mflags  MS_RDONLY);

VIR_DEBUG(Mount %s on %s type=%s flags=%x,
  mnt_src, mnt-dst, mnt-type, mnt_mflags  ~MS_RDONLY);
if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags 
~MS_RDONLY, NULL)  0) {

 Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
rw,nosuid,nodev,noexec,relatime and libvirt
misses the realtime...

virReportSystemError(errno,
 _(Failed to mount %s on %s type %s flags=%x),
 mnt_src, mnt-dst, NULLSTR(mnt-type),
 mnt_mflags  ~MS_RDONLY);
goto cleanup;
}

if (bindOverReadonly 
mount(mnt_src, mnt-dst, NULL,
  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL)  0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

 virReportSystemError(errno,
 _(Failed to re-mount %s on %s flags=%x),
 mnt_src, mnt-dst,
 MS_BIND|MS_REMOUNT|MS_RDONLY);
goto cleanup;
}


-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-20 Thread Eric W. Biederman
Richard Weinberger richard.weinber...@gmail.com writes:

 On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com 
 wrote:

 This commit breaks libvirt-lxc.
 libvirt does in lxcContainerMountBasicFS():

The bugs fixed are security issues, so if we have to break a small
number of userspace applications we will.  Anything that we can
reasonably do to avoid regressions will be done.

Could you please look at my user-namespace.git#for-next branch I have a
fix for at least one regresion causing issue in there.  I think it may
fix your issues but I am not fully certain more comments below.

 /*
  * We can't immediately set the MS_RDONLY flag when mounting 
 filesystems
  * because (in at least some kernel versions) this will propagate back
  * to the original mount in the host OS, turning it readonly too. Thus
  * we mount the filesystem in read-write mode initially, and then do a
  * separate read-only bind mount on top of that.
  */
 bindOverReadonly = !!(mnt_mflags  MS_RDONLY);

 VIR_DEBUG(Mount %s on %s type=%s flags=%x,
   mnt_src, mnt-dst, mnt-type, mnt_mflags  ~MS_RDONLY);
 if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags 
 ~MS_RDONLY, NULL)  0) {

  Here it fails for sysfs because with user namespaces we bind the
 existing /sys into the container
 and would have to read out all existing mount flags from the current /sys 
 mount.
 Otherwise mount() fails with EPERM.
 On my test system /sys is mounted with
 rw,nosuid,nodev,noexec,relatime and libvirt
 misses the realtime...

Not specifying any atime flags to mount should be safe as that asks for
the default atime flags which for remount I have made the default atime
flags the existing atime flags.  So I am scratching my head a little on
this one.


 virReportSystemError(errno,
  _(Failed to mount %s on %s type %s 
 flags=%x),
  mnt_src, mnt-dst, NULLSTR(mnt-type),
  mnt_mflags  ~MS_RDONLY);
 goto cleanup;
 }

 if (bindOverReadonly 
 mount(mnt_src, mnt-dst, NULL,
   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL)  0) {

 ^^^ Here it fails because now we'd have to specify all flags as used
 for the first
 mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
 See lxcBasicMounts[].
 In this case the fix is easy, add mnt_mflags to the mount flags.

That has always been a bug in general because remount has always
required specifying the complete set of mount flags you want to have.

That fact that flags such as nosuid are now properly locked so you can
not change them if you are not the global root user just makes this
obvious.

Andy Lutermorski has observed that statvfs will return the mount flags
making reading them simple.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-15 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
 wrote:
> Kenton Varda  writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
>>  wrote:

> From: "Eric W. Biederman" 
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.

Tested-but-disliked-by: Andy Lutomirski 

Quoting Alan Cox [1]:

Lying to apps generally ends up like children lying to parents - the
lie gets more complicated to keep up each case you find until it
breaks.

The kernel was unnecessarily fudging mount flags, and it broke,
because the fudge didn't go far enough.  This extends the fudge.
Let's just delete it instead.

--Andy

[1] https://lwn.net/Articles/608435/

>
> Cc: sta...@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" 
> ---
>  fs/namespace.c |  9 
>  .../selftests/mount/unprivileged-remount-test.c| 51 
> --
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, 
> int mnt_flags,
> if (path->dentry != path->mnt->mnt_root)
> return -EINVAL;
>
> +   /* Only in special cases allow devices from mounts created
> +* outside the initial user namespace.
> +*/
> +   if ((mnt->mnt_ns->user_ns != _user_ns) &&
> +   !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +   flags |= MS_NODEV;
> +   mnt_flags |= MNT_NODEV;
> +   }
> +
> /* Don't allow changing of locked mnt flags.
>  *
>  * No locks need to be held here while testing the various
> diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c 
> b/tools/testing/selftests/mount/unprivileged-remount-test.c
> index 1b3ff2fda4d0..6d408c155e0f 100644
> --- a/tools/testing/selftests/mount/unprivileged-remount-test.c
> +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
> @@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
>  }
>
>  static
> -bool test_unpriv_remount(int mount_flags, int remount_flags, int 
> invalid_flags)
> +bool test_unpriv_remount(const char *fstype, const char *mount_options,
> +int mount_flags, int remount_flags, int 
> invalid_flags)
>  {
> pid_t child;
>
> @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int 
> remount_flags, int invalid_flags)
> strerror(errno));
> }
>
> -   if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
> -   die("mount of /tmp failed: %s\n",
> -   strerror(errno));
> +   if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 
> 0) {
> +   die("mount of %s with options '%s' on /tmp failed: %s\n",
> +   fstype,
> +   mount_options? mount_options : "",
> +   strerror(errno));
> }
>
> create_and_enter_userns();
> @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int 
> remount_flags, int invalid_flags)
>
>  static bool test_unpriv_remount_simple(int mount_flags)
>  {
> -   return test_unpriv_remount(mount_flags, mount_flags, 0);
> +   return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 
> 0);
>  }
>
>  static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
>  {
> -   return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
> +   return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
> +  invalid_flags);
>  }
>
>  int main(int argc, char **argv)
>  {
> -   if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
> +   if (!test_unpriv_remount_simple(MS_RDONLY)) {
> die("MS_RDONLY malfunctions\n");
> }
> -   if (!test_unpriv_remount_simple(MS_NODEV)) {
> +   if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 
> 0)) {
> die("MS_NODEV malfunctions\n");
> }
> -   if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
> +   if (!test_unpriv_remount_simple(MS_NOSUID)) {
> die("MS_NOSUID malfunctions\n");
> }
> -   if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
> +   if (!test_unpriv_remount_simple(MS_NOEXEC)) {
> die("MS_NOEXEC malfunctions\n");
> }
> -   if 

Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-15 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
ebied...@xmission.com wrote:
 Kenton Varda ken...@sandstorm.io writes:

 On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
 ebied...@xmission.com wrote:

 From: Eric W. Biederman ebied...@xmission.com
 Date: Wed, 13 Aug 2014 01:33:38 -0700
 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

 Now that remount is properly enforcing the rule that you can't
 remove nodev at least sandstorm.io is breaking when performing
 a remount.

 It turns out that there is an easy intuitive solution implicitly
 add nodev on remount under the same conditions that we do on mount.

 Update the regression test for remounts to verify that this new
 addition does not regress either.

Tested-but-disliked-by: Andy Lutomirski l...@amacapital.net

Quoting Alan Cox [1]:

Lying to apps generally ends up like children lying to parents - the
lie gets more complicated to keep up each case you find until it
breaks.

The kernel was unnecessarily fudging mount flags, and it broke,
because the fudge didn't go far enough.  This extends the fudge.
Let's just delete it instead.

--Andy

[1] https://lwn.net/Articles/608435/


 Cc: sta...@vger.kernel.org
 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |  9 
  .../selftests/mount/unprivileged-remount-test.c| 51 
 --
  2 files changed, 36 insertions(+), 24 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index 7886176232c1..0f158300c866 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, 
 int mnt_flags,
 if (path-dentry != path-mnt-mnt_root)
 return -EINVAL;

 +   /* Only in special cases allow devices from mounts created
 +* outside the initial user namespace.
 +*/
 +   if ((mnt-mnt_ns-user_ns != init_user_ns) 
 +   !(sb-s_type-fs_flags  FS_USERNS_DEV_MOUNT)) {
 +   flags |= MS_NODEV;
 +   mnt_flags |= MNT_NODEV;
 +   }
 +
 /* Don't allow changing of locked mnt flags.
  *
  * No locks need to be held here while testing the various
 diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c 
 b/tools/testing/selftests/mount/unprivileged-remount-test.c
 index 1b3ff2fda4d0..6d408c155e0f 100644
 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c
 +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
 @@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
  }

  static
 -bool test_unpriv_remount(int mount_flags, int remount_flags, int 
 invalid_flags)
 +bool test_unpriv_remount(const char *fstype, const char *mount_options,
 +int mount_flags, int remount_flags, int 
 invalid_flags)
  {
 pid_t child;

 @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int 
 remount_flags, int invalid_flags)
 strerror(errno));
 }

 -   if (mount(testing, /tmp, ramfs, mount_flags, NULL) != 0) {
 -   die(mount of /tmp failed: %s\n,
 -   strerror(errno));
 +   if (mount(testing, /tmp, fstype, mount_flags, mount_options) != 
 0) {
 +   die(mount of %s with options '%s' on /tmp failed: %s\n,
 +   fstype,
 +   mount_options? mount_options : ,
 +   strerror(errno));
 }

 create_and_enter_userns();
 @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int 
 remount_flags, int invalid_flags)

  static bool test_unpriv_remount_simple(int mount_flags)
  {
 -   return test_unpriv_remount(mount_flags, mount_flags, 0);
 +   return test_unpriv_remount(ramfs, NULL, mount_flags, mount_flags, 
 0);
  }

  static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
  {
 -   return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
 +   return test_unpriv_remount(ramfs, NULL, mount_flags, mount_flags,
 +  invalid_flags);
  }

  int main(int argc, char **argv)
  {
 -   if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
 +   if (!test_unpriv_remount_simple(MS_RDONLY)) {
 die(MS_RDONLY malfunctions\n);
 }
 -   if (!test_unpriv_remount_simple(MS_NODEV)) {
 +   if (!test_unpriv_remount(devpts, newinstance, MS_NODEV, MS_NODEV, 
 0)) {
 die(MS_NODEV malfunctions\n);
 }
 -   if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
 +   if (!test_unpriv_remount_simple(MS_NOSUID)) {
 die(MS_NOSUID malfunctions\n);
 }
 -   if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
 +   if (!test_unpriv_remount_simple(MS_NOEXEC)) {
 die(MS_NOEXEC malfunctions\n);
 }
 -   if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
 -  

Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-13 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
 wrote:
> Kenton Varda  writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
> I have a proposed fix as the end of this email.  Could you please
> test it?
>
> If I can get a Tested-by and possibly a Reviewed-by responses I would
> appreciate it.

Will test later today.

>
>> The problem is that users like us had no idea that nodev was being
>> silently added in the first place, and thus didn't know that we needed
>> to specify it in remounts.
>
> Agreed.  And frankly that is extremely reasonable and it would in fact
> break anything doing a traditional tracking of mounts with /etc/mtab.
> So it is a pretty siginificant discontinuity.
>
> I am not prepared to remove the the implicit adding of nodev from the
> implementation as that would just break your code and probably others
> code in a different way.

I'm having trouble figuring out what would break.  The only way to end
up with a device node on one of these filesystems would be to pass an
fd to the file system out of the userns and have some globally
privileged code put the device node there.

>
> What we can do and that will work cleanly long term is make remount
> match mount and implicitly add nodev.
>
>> We create the tmpfs, put some things in it,
>> and then want to remount it read-only for the sandbox. It seems
>> reasonable to expect that a newly-created tmpfs would have exactly the
>> flags I gave it when I created it, not silently get an additional flag
>> that I then need to pass on remount.
>
> A reasonable expectation yes.  At the same time it is also a very
> reasonable security requirement to not allow devices on filesystems
> that the global root does not control.

But the magic nodev isn't needed for that; the checks in mknod are the
right place for this -- they're sufficient, and they're required
anyway.

>
> Compared to Andy's suggestion of relaxing the nodev restriction this
> will also work for filesystems like fuse and nfs when we allow begin to
> allow unprivileged mounts of those filesystems, and it allows to sleep
> better knowing the code is extremely paranoid.

If we allow fuse and nfs, we need to deal with nosuid, too.  I want to
avoid using nosuid for this; I want to change the suid logic to check
the namespace, since I think that suid *should* work in a namespace.

Also, that implicit nodev may be a problem down the road if we ever
add device namespaces.

...

>
> From: "Eric W. Biederman" 
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" 
> ---
>  fs/namespace.c |  9 
>  .../selftests/mount/unprivileged-remount-test.c| 51 
> --
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, 
> int mnt_flags,
> if (path->dentry != path->mnt->mnt_root)
> return -EINVAL;
>
> +   /* Only in special cases allow devices from mounts created
> +* outside the initial user namespace.

I don't like this comment at all.  If we stick with this approach, I
think that this should say:

Certain mounts are forced to have MNT_NODEV set.  For these mounts,
user code might not realize that nodev is set, so force MS_NODEV on
remount attempts to prevent confusing remount failures.

> +*/
> +   if ((mnt->mnt_ns->user_ns != _user_ns) &&
> +   !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +   flags |= MS_NODEV;
> +   mnt_flags |= MNT_NODEV;
> +   }
> +
> /* Don't allow changing of locked mnt flags.
>  *
>  * No locks need to be held here while testing the various

I will still advocate a different and much less magical approach.
I'll send a patch later today.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-13 Thread Eric W. Biederman
Kenton Varda  writes:

> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
>  wrote:
>
> If you can find a userspace application that matters I might care
> 
> that a security fix breaks it.
>
> FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not
> sure if you'd say that we "matter".

I should meant to have said:  "If you can find a userspace application
that breaks I might care that a security fix breaks it"

Hearing this actually break something changes this from a theoretical
discussion to something real.

I have a proposed fix as the end of this email.  Could you please
test it?

If I can get a Tested-by and possibly a Reviewed-by responses I would
appreciate it.

> The problem is that users like us had no idea that nodev was being
> silently added in the first place, and thus didn't know that we needed
> to specify it in remounts. 

Agreed.  And frankly that is extremely reasonable and it would in fact
break anything doing a traditional tracking of mounts with /etc/mtab.
So it is a pretty siginificant discontinuity.

I am not prepared to remove the the implicit adding of nodev from the
implementation as that would just break your code and probably others
code in a different way.

What we can do and that will work cleanly long term is make remount
match mount and implicitly add nodev.

> We create the tmpfs, put some things in it,
> and then want to remount it read-only for the sandbox. It seems
> reasonable to expect that a newly-created tmpfs would have exactly the
> flags I gave it when I created it, not silently get an additional flag
> that I then need to pass on remount.

A reasonable expectation yes.  At the same time it is also a very
reasonable security requirement to not allow devices on filesystems
that the global root does not control.

It is also reasonable that code that does not care should not have
to worry about this issue and work the same as the global root or
as a container root.  Which I think is the reasoning that I went
with when adding the implicit nodev.

> Note further that it may be very hard for normal developers to figure
> out why their remount is failing in this case. Andy only discovered
> the silent nodev by reading the kernel code.

Agreed.

So the simple solution I see with minimal danger and minimal security
risk is to remove the inconsitency between mount and remount and add
implicitly add MNT_NODEV during remount as well.

That removes the minor regression and keeps the code in the paranoid
state where device nodes are not honored and not allowed on filesystems
mounted by ordinary users.

Compared to Andy's suggestion of relaxing the nodev restriction this
will also work for filesystems like fuse and nfs when we allow begin to
allow unprivileged mounts of those filesystems, and it allows to sleep
better knowing the code is extremely paranoid.

This reminds me that the dust is settling and the man pages need to be
updated for all of these small tweaks to the API that have happened, so
there is a good source of documentation for how all of this actually
works and why.

I am going to be travelling pretty much the rest of the week, so won't
be particularly responsive.  But if you can get this tested I think
this counts as a minor regression fix that solves this issue.

Eric

From: "Eric W. Biederman" 
Date: Wed, 13 Aug 2014 01:33:38 -0700
Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

Now that remount is properly enforcing the rule that you can't
remove nodev at least sandstorm.io is breaking when performing
a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount under the same conditions that we do on mount.

Update the regression test for remounts to verify that this new
addition does not regress either.

Cc: sta...@vger.kernel.org
Signed-off-by: "Eric W. Biederman" 
---
 fs/namespace.c |  9 
 .../selftests/mount/unprivileged-remount-test.c| 51 --
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7886176232c1..0f158300c866 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int 
mnt_flags,
if (path->dentry != path->mnt->mnt_root)
return -EINVAL;
 
+   /* Only in special cases allow devices from mounts created
+* outside the initial user namespace.
+*/
+   if ((mnt->mnt_ns->user_ns != _user_ns) &&
+   !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
+   flags |= MS_NODEV;
+   mnt_flags |= MNT_NODEV;
+   }
+
/* Don't allow changing of locked mnt flags.
 *
 * No locks need to be held here while testing the various
diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c 
b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 1b3ff2fda4d0..6d408c155e0f 

Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-13 Thread Eric W. Biederman
Kenton Varda ken...@sandstorm.io writes:

 On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
 ebied...@xmission.com wrote:

 If you can find a userspace application that matters I might care
 
 that a security fix breaks it.

 FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not
 sure if you'd say that we matter.

I should meant to have said:  If you can find a userspace application
that breaks I might care that a security fix breaks it

Hearing this actually break something changes this from a theoretical
discussion to something real.

I have a proposed fix as the end of this email.  Could you please
test it?

If I can get a Tested-by and possibly a Reviewed-by responses I would
appreciate it.

 The problem is that users like us had no idea that nodev was being
 silently added in the first place, and thus didn't know that we needed
 to specify it in remounts. 

Agreed.  And frankly that is extremely reasonable and it would in fact
break anything doing a traditional tracking of mounts with /etc/mtab.
So it is a pretty siginificant discontinuity.

I am not prepared to remove the the implicit adding of nodev from the
implementation as that would just break your code and probably others
code in a different way.

What we can do and that will work cleanly long term is make remount
match mount and implicitly add nodev.

 We create the tmpfs, put some things in it,
 and then want to remount it read-only for the sandbox. It seems
 reasonable to expect that a newly-created tmpfs would have exactly the
 flags I gave it when I created it, not silently get an additional flag
 that I then need to pass on remount.

A reasonable expectation yes.  At the same time it is also a very
reasonable security requirement to not allow devices on filesystems
that the global root does not control.

It is also reasonable that code that does not care should not have
to worry about this issue and work the same as the global root or
as a container root.  Which I think is the reasoning that I went
with when adding the implicit nodev.

 Note further that it may be very hard for normal developers to figure
 out why their remount is failing in this case. Andy only discovered
 the silent nodev by reading the kernel code.

Agreed.

So the simple solution I see with minimal danger and minimal security
risk is to remove the inconsitency between mount and remount and add
implicitly add MNT_NODEV during remount as well.

That removes the minor regression and keeps the code in the paranoid
state where device nodes are not honored and not allowed on filesystems
mounted by ordinary users.

Compared to Andy's suggestion of relaxing the nodev restriction this
will also work for filesystems like fuse and nfs when we allow begin to
allow unprivileged mounts of those filesystems, and it allows to sleep
better knowing the code is extremely paranoid.

This reminds me that the dust is settling and the man pages need to be
updated for all of these small tweaks to the API that have happened, so
there is a good source of documentation for how all of this actually
works and why.

I am going to be travelling pretty much the rest of the week, so won't
be particularly responsive.  But if you can get this tested I think
this counts as a minor regression fix that solves this issue.

Eric

From: Eric W. Biederman ebied...@xmission.com
Date: Wed, 13 Aug 2014 01:33:38 -0700
Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

Now that remount is properly enforcing the rule that you can't
remove nodev at least sandstorm.io is breaking when performing
a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount under the same conditions that we do on mount.

Update the regression test for remounts to verify that this new
addition does not regress either.

Cc: sta...@vger.kernel.org
Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 fs/namespace.c |  9 
 .../selftests/mount/unprivileged-remount-test.c| 51 --
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7886176232c1..0f158300c866 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int 
mnt_flags,
if (path-dentry != path-mnt-mnt_root)
return -EINVAL;
 
+   /* Only in special cases allow devices from mounts created
+* outside the initial user namespace.
+*/
+   if ((mnt-mnt_ns-user_ns != init_user_ns) 
+   !(sb-s_type-fs_flags  FS_USERNS_DEV_MOUNT)) {
+   flags |= MS_NODEV;
+   mnt_flags |= MNT_NODEV;
+   }
+
/* Don't allow changing of locked mnt flags.
 *
 * No locks need to be held here while testing the various
diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c 

Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-13 Thread Andy Lutomirski
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
ebied...@xmission.com wrote:
 Kenton Varda ken...@sandstorm.io writes:

 On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
 I have a proposed fix as the end of this email.  Could you please
 test it?

 If I can get a Tested-by and possibly a Reviewed-by responses I would
 appreciate it.

Will test later today.


 The problem is that users like us had no idea that nodev was being
 silently added in the first place, and thus didn't know that we needed
 to specify it in remounts.

 Agreed.  And frankly that is extremely reasonable and it would in fact
 break anything doing a traditional tracking of mounts with /etc/mtab.
 So it is a pretty siginificant discontinuity.

 I am not prepared to remove the the implicit adding of nodev from the
 implementation as that would just break your code and probably others
 code in a different way.

I'm having trouble figuring out what would break.  The only way to end
up with a device node on one of these filesystems would be to pass an
fd to the file system out of the userns and have some globally
privileged code put the device node there.


 What we can do and that will work cleanly long term is make remount
 match mount and implicitly add nodev.

 We create the tmpfs, put some things in it,
 and then want to remount it read-only for the sandbox. It seems
 reasonable to expect that a newly-created tmpfs would have exactly the
 flags I gave it when I created it, not silently get an additional flag
 that I then need to pass on remount.

 A reasonable expectation yes.  At the same time it is also a very
 reasonable security requirement to not allow devices on filesystems
 that the global root does not control.

But the magic nodev isn't needed for that; the checks in mknod are the
right place for this -- they're sufficient, and they're required
anyway.


 Compared to Andy's suggestion of relaxing the nodev restriction this
 will also work for filesystems like fuse and nfs when we allow begin to
 allow unprivileged mounts of those filesystems, and it allows to sleep
 better knowing the code is extremely paranoid.

If we allow fuse and nfs, we need to deal with nosuid, too.  I want to
avoid using nosuid for this; I want to change the suid logic to check
the namespace, since I think that suid *should* work in a namespace.

Also, that implicit nodev may be a problem down the road if we ever
add device namespaces.

...


 From: Eric W. Biederman ebied...@xmission.com
 Date: Wed, 13 Aug 2014 01:33:38 -0700
 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

 Now that remount is properly enforcing the rule that you can't
 remove nodev at least sandstorm.io is breaking when performing
 a remount.

 It turns out that there is an easy intuitive solution implicitly
 add nodev on remount under the same conditions that we do on mount.

 Update the regression test for remounts to verify that this new
 addition does not regress either.

 Cc: sta...@vger.kernel.org
 Signed-off-by: Eric W. Biederman ebied...@xmission.com
 ---
  fs/namespace.c |  9 
  .../selftests/mount/unprivileged-remount-test.c| 51 
 --
  2 files changed, 36 insertions(+), 24 deletions(-)

 diff --git a/fs/namespace.c b/fs/namespace.c
 index 7886176232c1..0f158300c866 100644
 --- a/fs/namespace.c
 +++ b/fs/namespace.c
 @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, 
 int mnt_flags,
 if (path-dentry != path-mnt-mnt_root)
 return -EINVAL;

 +   /* Only in special cases allow devices from mounts created
 +* outside the initial user namespace.

I don't like this comment at all.  If we stick with this approach, I
think that this should say:

Certain mounts are forced to have MNT_NODEV set.  For these mounts,
user code might not realize that nodev is set, so force MS_NODEV on
remount attempts to prevent confusing remount failures.

 +*/
 +   if ((mnt-mnt_ns-user_ns != init_user_ns) 
 +   !(sb-s_type-fs_flags  FS_USERNS_DEV_MOUNT)) {
 +   flags |= MS_NODEV;
 +   mnt_flags |= MNT_NODEV;
 +   }
 +
 /* Don't allow changing of locked mnt flags.
  *
  * No locks need to be held here while testing the various

I will still advocate a different and much less magical approach.
I'll send a patch later today.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Andy Lutomirski
On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
 wrote:
> Andy Lutomirski  writes:
>
>> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>>
>> Sorry for catching this late.  I think this fix is likely to
>> unnecessarily break valid userspace due to an odd interaction.
>
> The code is correct and safe (no security issues), but yes a blind
> remount might hit a snag.
>
> If you can find a userspace application that matters I might care
> that a security fix breaks it.
>
> I think you have made a point that several more filesystems might
> be ok to not set nodev on (because we can not do anything to create
> device nodes on those filesystems).  I personally would prefer the much
> more paranoid approach of only allowing device nodes on a unprivileged
> mount if we have audited all of the code paths and know it is safe
> for device nodes to appear there.
>
> I don't actually think anyone cares ad remounts of filesystems like
> tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
> are even rare.  The normal userspace utilities look at the appropriate
> version of /proc/mounts on remount.

Bind remounts are the only kind of remounts, because we've never
supported do_remount_sb in a user namespace.  So, if you want to
create some static content in your user namespace, the way to do it
is:

unshare(CLONE_NEWUSER | CLONE_NEWNS);
mount tmpfs somewhere;
write to the tmpfs;
mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND
| MS_RDONLY);

>
> These are not filesystems that a blind remount will likely be applied
> to.
>
> Furthermore there is work underway to prepare patches to allow
> "mount --bind -ro" to work as expected.  That will further reduce
> the pressure from blind remounts.

Not for example above.  It really does need the remount.

>
> If there is an actual regression of actual code I am happy to deal
> with it.  But having the MNT_NODEV on those mounts has been the case
> for a long time now and is not new (no regression).  This change just
> closed the security hole that allowed nodev to be removed.  And that
> security hole we need to have fixed.

Sandstorm does this.  (Well, it did until today.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Eric W. Biederman
Andy Lutomirski  writes:

> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>
> Sorry for catching this late.  I think this fix is likely to
> unnecessarily break valid userspace due to an odd interaction.

The code is correct and safe (no security issues), but yes a blind
remount might hit a snag.

If you can find a userspace application that matters I might care
that a security fix breaks it.

I think you have made a point that several more filesystems might
be ok to not set nodev on (because we can not do anything to create
device nodes on those filesystems).  I personally would prefer the much
more paranoid approach of only allowing device nodes on a unprivileged
mount if we have audited all of the code paths and know it is safe
for device nodes to appear there.

I don't actually think anyone cares ad remounts of filesystems like
tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
are even rare.  The normal userspace utilities look at the appropriate
version of /proc/mounts on remount.

These are not filesystems that a blind remount will likely be applied
to.

Furthermore there is work underway to prepare patches to allow
"mount --bind -ro" to work as expected.  That will further reduce
the pressure from blind remounts.

If there is an actual regression of actual code I am happy to deal
with it.  But having the MNT_NODEV on those mounts has been the case
for a long time now and is not new (no regression).  This change just
closed the security hole that allowed nodev to be removed.  And that
security hole we need to have fixed.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Andy Lutomirski
On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
> 
> Linus,
> 
> Please pull the for-linus branch from the git tree:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> for-linus
> 
>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
> /proc/thread-self/mounts instead of /proc/self/mounts
> 
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
> 
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
> 

Sorry for catching this late.  I think this fix is likely to
unnecessarily break valid userspace due to an odd interaction.

do_new_mount contains this:

if (user_ns != _user_ns) {
if (!(type->fs_flags & FS_USERNS_MOUNT)) {
put_filesystem(type);
return -EPERM;
}
/* Only in special cases allow devices from mounts
 * created outside the initial user namespace.
 */
if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
flags |= MS_NODEV;
mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
}
}

This means that private tmpfs mounts end up with MNT_NODEV |
MNT_LOCK_NODEV.  Certainly the change from MNT_NODEV to MNT_LOCK_NODEV
is sensible *if MNT_NODEV made sense in the first place*.  But we didn't
have MNT_LOCK_NODEV in the past, so this well-intentioned code never
really worked.

This has a very strange effect: mounting a private tmpfs with all
default flags and then remounting with MS_REMOUNT | MS_BIND *without
MS_NODEV* will now fail.  I suspect that this practice is rather common,
since most likely no one ever paid attention to that implicit MNT_NODEV
before.

I would argue that the correct fix is to either remove the implicit
MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems.

The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and
devpts.  devpts already sets FS_USERNS_DEV_MOUNT.  Setting
FS_USERNS_DEV_MOUNT should be safe on all of the others in this list --
I think that the only one that initially contains device nodes is sysfs
on selinux systems, which contains a null node.

I think the point of this is to prevent mounts of filesystems with
user-controlled initial contents from being dangerous.  But we don't
have any of those yet.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Andy Lutomirski
On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
 
 Linus,
 
 Please pull the for-linus branch from the git tree:
 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
 for-linus
 
HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
 /proc/thread-self/mounts instead of /proc/self/mounts
 
 This is a bunch of small changes built against 3.16-rc6.  The most
 significant change for users is the first patch which makes setns
 drmatically faster by removing unneded rcu handling.
 
 The next chunk of changes are so that mount -o remount,.. will not
 allow the user namespace root to drop flags on a mount set by the system
 wide root.  Aks this forces read-only mounts to stay read-only, no-dev
 mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
 stay no exec and it prevents unprivileged users from messing with a
 mounts atime settings.  I have included my test case as the last patch
 in this series so people performing backports can verify this change
 works correctly.
 

Sorry for catching this late.  I think this fix is likely to
unnecessarily break valid userspace due to an odd interaction.

do_new_mount contains this:

if (user_ns != init_user_ns) {
if (!(type-fs_flags  FS_USERNS_MOUNT)) {
put_filesystem(type);
return -EPERM;
}
/* Only in special cases allow devices from mounts
 * created outside the initial user namespace.
 */
if (!(type-fs_flags  FS_USERNS_DEV_MOUNT)) {
flags |= MS_NODEV;
mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
}
}

This means that private tmpfs mounts end up with MNT_NODEV |
MNT_LOCK_NODEV.  Certainly the change from MNT_NODEV to MNT_LOCK_NODEV
is sensible *if MNT_NODEV made sense in the first place*.  But we didn't
have MNT_LOCK_NODEV in the past, so this well-intentioned code never
really worked.

This has a very strange effect: mounting a private tmpfs with all
default flags and then remounting with MS_REMOUNT | MS_BIND *without
MS_NODEV* will now fail.  I suspect that this practice is rather common,
since most likely no one ever paid attention to that implicit MNT_NODEV
before.

I would argue that the correct fix is to either remove the implicit
MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems.

The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and
devpts.  devpts already sets FS_USERNS_DEV_MOUNT.  Setting
FS_USERNS_DEV_MOUNT should be safe on all of the others in this list --
I think that the only one that initially contains device nodes is sysfs
on selinux systems, which contains a null node.

I think the point of this is to prevent mounts of filesystems with
user-controlled initial contents from being dangerous.  But we don't
have any of those yet.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Eric W. Biederman
Andy Lutomirski l...@amacapital.net writes:

 On 08/05/2014 05:57 PM, Eric W. Biederman wrote:

 Sorry for catching this late.  I think this fix is likely to
 unnecessarily break valid userspace due to an odd interaction.

The code is correct and safe (no security issues), but yes a blind
remount might hit a snag.

If you can find a userspace application that matters I might care
that a security fix breaks it.

I think you have made a point that several more filesystems might
be ok to not set nodev on (because we can not do anything to create
device nodes on those filesystems).  I personally would prefer the much
more paranoid approach of only allowing device nodes on a unprivileged
mount if we have audited all of the code paths and know it is safe
for device nodes to appear there.

I don't actually think anyone cares ad remounts of filesystems like
tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
are even rare.  The normal userspace utilities look at the appropriate
version of /proc/mounts on remount.

These are not filesystems that a blind remount will likely be applied
to.

Furthermore there is work underway to prepare patches to allow
mount --bind -ro to work as expected.  That will further reduce
the pressure from blind remounts.

If there is an actual regression of actual code I am happy to deal
with it.  But having the MNT_NODEV on those mounts has been the case
for a long time now and is not new (no regression).  This change just
closed the security hole that allowed nodev to be removed.  And that
security hole we need to have fixed.

Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-12 Thread Andy Lutomirski
On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 On 08/05/2014 05:57 PM, Eric W. Biederman wrote:

 Sorry for catching this late.  I think this fix is likely to
 unnecessarily break valid userspace due to an odd interaction.

 The code is correct and safe (no security issues), but yes a blind
 remount might hit a snag.

 If you can find a userspace application that matters I might care
 that a security fix breaks it.

 I think you have made a point that several more filesystems might
 be ok to not set nodev on (because we can not do anything to create
 device nodes on those filesystems).  I personally would prefer the much
 more paranoid approach of only allowing device nodes on a unprivileged
 mount if we have audited all of the code paths and know it is safe
 for device nodes to appear there.

 I don't actually think anyone cares ad remounts of filesystems like
 tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
 are even rare.  The normal userspace utilities look at the appropriate
 version of /proc/mounts on remount.

Bind remounts are the only kind of remounts, because we've never
supported do_remount_sb in a user namespace.  So, if you want to
create some static content in your user namespace, the way to do it
is:

unshare(CLONE_NEWUSER | CLONE_NEWNS);
mount tmpfs somewhere;
write to the tmpfs;
mount(path to tmpfs, path to tmpfs, nullptr, MS_REMOUNT | MS_BIND
| MS_RDONLY);


 These are not filesystems that a blind remount will likely be applied
 to.

 Furthermore there is work underway to prepare patches to allow
 mount --bind -ro to work as expected.  That will further reduce
 the pressure from blind remounts.

Not for example above.  It really does need the remount.


 If there is an actual regression of actual code I am happy to deal
 with it.  But having the MNT_NODEV on those mounts has been the case
 for a long time now and is not new (no regression).  This change just
 closed the security hole that allowed nodev to be removed.  And that
 security hole we need to have fixed.

Sandstorm does this.  (Well, it did until today.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-07 Thread Theodore Ts'o
On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote:
> 
> Yes, that is what confused my, sorry, since that commit and the ones
> following are new commits but unchanged patches.

As a suggestion, in addition to trying to count the number of commits
that are new, or with the same patch id's, etc., is to provide the
diff stats between the last version of the branch that landed in
linux-next before the merge window opened, and the branch head that
was requested for Linus to pull?

This will get tripped up by those poeple who send multiple pull
requests which are subsets of the branch that is tracked by linux-next
(for example, to arrange the order of merging to minimize conflicts),
so like all things, it's not perfect, but it might be useful
additional indicator, at least in the early days of the merge window.

Also, if you could report the commit id of the branch head before the
merge window opened, then other people (including Linus) can do their
own investigation and draw their own conclusions.

Cheers,

- Ted

P.S.  If there was some way this could be automated so that a 'bot
sends out a report for every single pull request, then it minimizes
the "J'accuse!" aspect, since the report is being done for all pull
requests, and everyone knows that robots makes mistakes.  :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-07 Thread Theodore Ts'o
On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote:
 
 Yes, that is what confused my, sorry, since that commit and the ones
 following are new commits but unchanged patches.

As a suggestion, in addition to trying to count the number of commits
that are new, or with the same patch id's, etc., is to provide the
diff stats between the last version of the branch that landed in
linux-next before the merge window opened, and the branch head that
was requested for Linus to pull?

This will get tripped up by those poeple who send multiple pull
requests which are subsets of the branch that is tracked by linux-next
(for example, to arrange the order of merging to minimize conflicts),
so like all things, it's not perfect, but it might be useful
additional indicator, at least in the early days of the merge window.

Also, if you could report the commit id of the branch head before the
merge window opened, then other people (including Linus) can do their
own investigation and draw their own conclusions.

Cheers,

- Ted

P.S.  If there was some way this could be automated so that a 'bot
sends out a report for every single pull request, then it minimizes
the J'accuse! aspect, since the report is being done for all pull
requests, and everyone knows that robots makes mistakes.  :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-06 Thread Eric W. Biederman
Stephen Rothwell  writes:

> Hi Eric,
>
>> As for missing cool tags shrug.  The people looking at my code didn't
>> feel like saying the magic words so I didn't include cool tags.
>
> Maybe you should push them ... these tags are not just "cool", they
> give less involved people some indications of what has happened in the
> life of a patch.

Etiquette is a good idea where it helps things flow smoothly.  Other
times like this standing on etiquette would have resulted in trivial
fixes for long standing issues not going anywhere.

Althogh I will keep it in mind in the future.

>> Which is a my long winded way of say it sounds like you are accusing me
>> of being irresponsible, and my way of saying that I have tested and
>
> I did not mean to accuse, there are reasons (as you have pointed out)
> that things get added late, or rewritten.

I really try to avoid those reasons but I missed it this round and the
buildbot did not manage to find the comibination with a build error
until the code hit linux-next :(

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-06 Thread Stephen Rothwell
Hi Eric,

On Tue, 05 Aug 2014 22:16:06 -0700 ebied...@xmission.com (Eric W. Biederman) 
wrote:
>
> I am not certain what your point is.

I am just trying to give Linus a heads up for branches that have not
had much exposure before he is asked to pull them.

> There have been no commits added since the merge window opened.
> 
> There was one commit changed to fix a typo.  I documented that already.

Yes, that is what confused my, sorry, since that commit and the ones
following are new commits but unchanged patches.

> There were some commits pushed to the tree as late as friday that had
> been out for review earlier than that and it is possible that you did
> not pick them up in linux-next until monday.  That doesn't mean I added
> anything after the merge window opened.

Correct.

> I have also made certain all of these commits have at least had a chance
> to show up in linux-next.
> 
> As for missing cool tags shrug.  The people looking at my code didn't
> feel like saying the magic words so I didn't include cool tags.

Maybe you should push them ... these tags are not just "cool", they
give less involved people some indications of what has happened in the
life of a patch.

> Beyond that I have been quite out of it recently and this is what I had
> time to do.  If I had had a little more time and energy I would have
> included unmount on unlink patches that still need magic to happen to
> keep from blowing the stack in pathological cases on everything except
> x86_64.  That code has been sitting in linux-next.
> 
> Which is a my long winded way of say it sounds like you are accusing me
> of being irresponsible, and my way of saying that I have tested and

I did not mean to accuse, there are reasons (as you have pointed out)
that things get added late, or rewritten.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-06 Thread Stephen Rothwell
Hi Eric,

On Tue, 05 Aug 2014 22:16:06 -0700 ebied...@xmission.com (Eric W. Biederman) 
wrote:

 I am not certain what your point is.

I am just trying to give Linus a heads up for branches that have not
had much exposure before he is asked to pull them.

 There have been no commits added since the merge window opened.
 
 There was one commit changed to fix a typo.  I documented that already.

Yes, that is what confused my, sorry, since that commit and the ones
following are new commits but unchanged patches.

 There were some commits pushed to the tree as late as friday that had
 been out for review earlier than that and it is possible that you did
 not pick them up in linux-next until monday.  That doesn't mean I added
 anything after the merge window opened.

Correct.

 I have also made certain all of these commits have at least had a chance
 to show up in linux-next.
 
 As for missing cool tags shrug.  The people looking at my code didn't
 feel like saying the magic words so I didn't include cool tags.

Maybe you should push them ... these tags are not just cool, they
give less involved people some indications of what has happened in the
life of a patch.

 Beyond that I have been quite out of it recently and this is what I had
 time to do.  If I had had a little more time and energy I would have
 included unmount on unlink patches that still need magic to happen to
 keep from blowing the stack in pathological cases on everything except
 x86_64.  That code has been sitting in linux-next.
 
 Which is a my long winded way of say it sounds like you are accusing me
 of being irresponsible, and my way of saying that I have tested and

I did not mean to accuse, there are reasons (as you have pointed out)
that things get added late, or rewritten.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-06 Thread Eric W. Biederman
Stephen Rothwell s...@canb.auug.org.au writes:

 Hi Eric,

 As for missing cool tags shrug.  The people looking at my code didn't
 feel like saying the magic words so I didn't include cool tags.

 Maybe you should push them ... these tags are not just cool, they
 give less involved people some indications of what has happened in the
 life of a patch.

Etiquette is a good idea where it helps things flow smoothly.  Other
times like this standing on etiquette would have resulted in trivial
fixes for long standing issues not going anywhere.

Althogh I will keep it in mind in the future.

 Which is a my long winded way of say it sounds like you are accusing me
 of being irresponsible, and my way of saying that I have tested and

 I did not mean to accuse, there are reasons (as you have pointed out)
 that things get added late, or rewritten.

I really try to avoid those reasons but I missed it this round and the
buildbot did not manage to find the comibination with a build error
until the code hit linux-next :(

Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Eric W. Biederman
Stephen Rothwell  writes:

> Hi Eric,
>
> On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) 
> wrote:
>>
>> Please pull the for-linus branch from the git tree:
>> 
>>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
>> for-linus
>> 
>>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts 
>> at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This has had 4 commits added since the merge window opened that have no
> Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by
> tag.

I am not certain what your point is.

There have been no commits added since the merge window opened.

There was one commit changed to fix a typo.  I documented that already.

There were some commits pushed to the tree as late as friday that had
been out for review earlier than that and it is possible that you did
not pick them up in linux-next until monday.  That doesn't mean I added
anything after the merge window opened.

I have also made certain all of these commits have at least had a chance
to show up in linux-next.

As for missing cool tags shrug.  The people looking at my code didn't
feel like saying the magic words so I didn't include cool tags.
Furthermore the code is all quite trivial.

Beyond that I have been quite out of it recently and this is what I had
time to do.  If I had had a little more time and energy I would have
included unmount on unlink patches that still need magic to happen to
keep from blowing the stack in pathological cases on everything except
x86_64.  That code has been sitting in linux-next.

Which is a my long winded way of say it sounds like you are accusing me
of being irresponsible, and my way of saying that I have tested and
vetted these trivial patches and been as careful as I reasonably could,
and I that I apologise for not being able to do things a few days
earlier.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Stephen Rothwell
Hi Eric,

On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) 
wrote:
>
> Please pull the for-linus branch from the git tree:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
> for-linus
> 
>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
> /proc/thread-self/mounts instead of /proc/self/mounts

This has had 4 commits added since the merge window opened that have no
Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag.

> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.

They appear to include this last change.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


[GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Eric W. Biederman

Linus,

Please pull the for-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
for-linus

   HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
/proc/thread-self/mounts instead of /proc/self/mounts

This is a bunch of small changes built against 3.16-rc6.  The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.

The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the system
wide root.  Aks this forces read-only mounts to stay read-only, no-dev
mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
stay no exec and it prevents unprivileged users from messing with a
mounts atime settings.  I have included my test case as the last patch
in this series so people performing backports can verify this change
works correctly.

The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization.  Today you can oops the kernel
by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
yesterday when a build bot caught my typo.  Given that no one to my
knowledge bases anything on my tree fixing the typo in place seems more
responsible that requiring a typo-fix to be backported as well.

The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
prevents several kinds of problemantic corner cases.  It is a
user-visible change so it has a minute chance of causing regressions so
the change to /proc/mounts and /proc/net are individual one line commits
that can be trivially reverted.  Unfortunately I lost and could not find
the email of the original reporter so he is not credited.  From at least
one perspective this change to /proc/net is a refgression fix to allow
pthread /proc/net uses that were broken by the introduction of the network
namespace.

Eric

Eric W. Biederman (11):
  namespaces: Use task_lock and not rcu to protect nsproxy
  mnt: Only change user settable mount flags in remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into 
do_remount
  mnt: Correct permission checks in do_remount
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
  proc: Have net show up under /proc//task/
  proc: Implement /proc/thread-self to point at the directory of the 
current thread
  proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
  proc: Point /proc/mounts at /proc/thread-self/mounts instead of 
/proc/self/mounts

 fs/namespace.c |  65 +-
 fs/nfs/client.c|  95 
 fs/nfs/inode.c |   3 +-
 fs/nfs/internal.h  |   9 +
 fs/nfs/netns.h |   3 +
 fs/proc/Makefile   |   1 +
 fs/proc/base.c |  18 +-
 fs/proc/inode.c|   7 +-
 fs/proc/internal.h |   6 +
 fs/proc/proc_net.c |   6 +-
 fs/proc/root.c |   5 +-
 fs/proc/thread_self.c  |  85 
 fs/proc_namespace.c|   8 +-
 include/linux/mount.h  |   9 +-
 include/linux/nsproxy.h|  16 +-
 include/linux/pid_namespace.h  |   1 +
 ipc/namespace.c|   6 +-
 kernel/nsproxy.c   |  15 +-
 kernel/utsname.c   |   6 +-
 net/core/net_namespace.c   |  10 +-
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/mount/Makefile |  17 ++
 .../selftests/mount/unprivileged-remount-test.c| 242 +
 23 files changed, 537 insertions(+), 97 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Eric W. Biederman

Linus,

Please pull the for-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
for-linus

   HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
/proc/thread-self/mounts instead of /proc/self/mounts

This is a bunch of small changes built against 3.16-rc6.  The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.

The next chunk of changes are so that mount -o remount,.. will not
allow the user namespace root to drop flags on a mount set by the system
wide root.  Aks this forces read-only mounts to stay read-only, no-dev
mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
stay no exec and it prevents unprivileged users from messing with a
mounts atime settings.  I have included my test case as the last patch
in this series so people performing backports can verify this change
works correctly.

The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization.  Today you can oops the kernel
by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
yesterday when a build bot caught my typo.  Given that no one to my
knowledge bases anything on my tree fixing the typo in place seems more
responsible that requiring a typo-fix to be backported as well.

The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
prevents several kinds of problemantic corner cases.  It is a
user-visible change so it has a minute chance of causing regressions so
the change to /proc/mounts and /proc/net are individual one line commits
that can be trivially reverted.  Unfortunately I lost and could not find
the email of the original reporter so he is not credited.  From at least
one perspective this change to /proc/net is a refgression fix to allow
pthread /proc/net uses that were broken by the introduction of the network
namespace.

Eric

Eric W. Biederman (11):
  namespaces: Use task_lock and not rcu to protect nsproxy
  mnt: Only change user settable mount flags in remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into 
do_remount
  mnt: Correct permission checks in do_remount
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
  proc: Have net show up under /proc/tgid/task/tid
  proc: Implement /proc/thread-self to point at the directory of the 
current thread
  proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
  proc: Point /proc/mounts at /proc/thread-self/mounts instead of 
/proc/self/mounts

 fs/namespace.c |  65 +-
 fs/nfs/client.c|  95 
 fs/nfs/inode.c |   3 +-
 fs/nfs/internal.h  |   9 +
 fs/nfs/netns.h |   3 +
 fs/proc/Makefile   |   1 +
 fs/proc/base.c |  18 +-
 fs/proc/inode.c|   7 +-
 fs/proc/internal.h |   6 +
 fs/proc/proc_net.c |   6 +-
 fs/proc/root.c |   5 +-
 fs/proc/thread_self.c  |  85 
 fs/proc_namespace.c|   8 +-
 include/linux/mount.h  |   9 +-
 include/linux/nsproxy.h|  16 +-
 include/linux/pid_namespace.h  |   1 +
 ipc/namespace.c|   6 +-
 kernel/nsproxy.c   |  15 +-
 kernel/utsname.c   |   6 +-
 net/core/net_namespace.c   |  10 +-
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/mount/Makefile |  17 ++
 .../selftests/mount/unprivileged-remount-test.c| 242 +
 23 files changed, 537 insertions(+), 97 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Stephen Rothwell
Hi Eric,

On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) 
wrote:

 Please pull the for-linus branch from the git tree:
 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
 for-linus
 
HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at 
 /proc/thread-self/mounts instead of /proc/self/mounts

This has had 4 commits added since the merge window opened that have no
Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag.

 The last change is a small semantic cleanup introducing
 /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
 prevents several kinds of problemantic corner cases.  It is a
 user-visible change so it has a minute chance of causing regressions so
 the change to /proc/mounts and /proc/net are individual one line commits
 that can be trivially reverted.  Unfortunately I lost and could not find
 the email of the original reporter so he is not credited.  From at least
 one perspective this change to /proc/net is a refgression fix to allow
 pthread /proc/net uses that were broken by the introduction of the network
 namespace.

They appear to include this last change.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [GIT PULL] namespace updates for v3.17-rc1

2014-08-05 Thread Eric W. Biederman
Stephen Rothwell s...@canb.auug.org.au writes:

 Hi Eric,

 On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) 
 wrote:

 Please pull the for-linus branch from the git tree:
 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
 for-linus
 
HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts 
 at /proc/thread-self/mounts instead of /proc/self/mounts

 This has had 4 commits added since the merge window opened that have no
 Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by
 tag.

I am not certain what your point is.

There have been no commits added since the merge window opened.

There was one commit changed to fix a typo.  I documented that already.

There were some commits pushed to the tree as late as friday that had
been out for review earlier than that and it is possible that you did
not pick them up in linux-next until monday.  That doesn't mean I added
anything after the merge window opened.

I have also made certain all of these commits have at least had a chance
to show up in linux-next.

As for missing cool tags shrug.  The people looking at my code didn't
feel like saying the magic words so I didn't include cool tags.
Furthermore the code is all quite trivial.

Beyond that I have been quite out of it recently and this is what I had
time to do.  If I had had a little more time and energy I would have
included unmount on unlink patches that still need magic to happen to
keep from blowing the stack in pathological cases on everything except
x86_64.  That code has been sitting in linux-next.

Which is a my long winded way of say it sounds like you are accusing me
of being irresponsible, and my way of saying that I have tested and
vetted these trivial patches and been as careful as I reasonably could,
and I that I apologise for not being able to do things a few days
earlier.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/