Re: [GIT PULL] namespace updates for v3.17-rc1
Am 26.11.2014 um 00:15 schrieb Richard Weinberger: > Eric, > > On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman > wrote: >> Richard Weinberger writes: >> >>> Am 21.08.2014 15:12, schrieb Christoph Hellwig: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: > Richard Weinberger writes: > >> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman >> wrote: >> >> This commit breaks libvirt-lxc. >> libvirt does in lxcContainerMountBasicFS(): > > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. >>> >>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >>> unbreaks libvirt-lxc. >>> I hope it hits Linus tree and -stable before the offending commit hits >>> users. >> >> I plan to send the pull request to Linus as soon as I have caught my >> breath (from all of the conferences this week) that I can be certain I >> am thinking clearly and not rushing things. > > Today I've upgraded my LXC testbed to the most recent kernel and found > libvirt-lxc broken again (sic!). > Remounting /proc/sys/ is failing. > Investigating into the issue showed that commit "mnt: Implicitly add > MNT_NODEV on remount as we do on mount" > is not mainline. > Why did you left out this patch? In my previous mails I explicitly > stated that exactly this commit unbreaks libvirt-lxc. > > Now the userspace breaking changes are mainline and hit users hard. :-( *kind ping* ...to make sure that this issue doesn't get lost. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 26.11.2014 um 00:15 schrieb Richard Weinberger: Eric, On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman ebied...@xmission.com wrote: Richard Weinberger rich...@nod.at writes: Am 21.08.2014 15:12, schrieb Christoph Hellwig: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Today I've upgraded my LXC testbed to the most recent kernel and found libvirt-lxc broken again (sic!). Remounting /proc/sys/ is failing. Investigating into the issue showed that commit mnt: Implicitly add MNT_NODEV on remount as we do on mount is not mainline. Why did you left out this patch? In my previous mails I explicitly stated that exactly this commit unbreaks libvirt-lxc. Now the userspace breaking changes are mainline and hit users hard. :-( *kind ping* ...to make sure that this issue doesn't get lost. Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Eric, On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman wrote: > Richard Weinberger writes: > >> Am 21.08.2014 15:12, schrieb Christoph Hellwig: >>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger writes: > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman > wrote: > > This commit breaks libvirt-lxc. > libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. >>> >>> Can you explain the security issues in detail? Breaking common >>> userspace like libvirt-lxc with just a little bit of handwaiving is >>> entirely unacceptable. >> >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. Today I've upgraded my LXC testbed to the most recent kernel and found libvirt-lxc broken again (sic!). Remounting /proc/sys/ is failing. Investigating into the issue showed that commit "mnt: Implicitly add MNT_NODEV on remount as we do on mount" is not mainline. Why did you left out this patch? In my previous mails I explicitly stated that exactly this commit unbreaks libvirt-lxc. Now the userspace breaking changes are mainline and hit users hard. :-( -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Eric, On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman ebied...@xmission.com wrote: Richard Weinberger rich...@nod.at writes: Am 21.08.2014 15:12, schrieb Christoph Hellwig: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Today I've upgraded my LXC testbed to the most recent kernel and found libvirt-lxc broken again (sic!). Remounting /proc/sys/ is failing. Investigating into the issue showed that commit mnt: Implicitly add MNT_NODEV on remount as we do on mount is not mainline. Why did you left out this patch? In my previous mails I explicitly stated that exactly this commit unbreaks libvirt-lxc. Now the userspace breaking changes are mainline and hit users hard. :-( -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman wrote: >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. *kind reminder* :-) -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman ebied...@xmission.com wrote: It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. *kind reminder* :-) -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger writes: > Am 21.08.2014 15:12, schrieb Christoph Hellwig: >> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>> Richard Weinberger writes: >>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): >>> >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >> >> Can you explain the security issues in detail? Breaking common >> userspace like libvirt-lxc with just a little bit of handwaiving is >> entirely unacceptable. > > It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in > git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next > unbreaks libvirt-lxc. > I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger writes: > Am 21.08.2014 08:29, schrieb Richard Weinberger: >> Am 21.08.2014 06:53, schrieb Eric W. Biederman: >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >>> >>> Could you please look at my user-namespace.git#for-next branch I have a >>> fix for at least one regresion causing issue in there. I think it may >>> fix your issues but I am not fully certain more comments below. >> >> I'll run this on my LXC testbed today. > > Looks good. With these patches applied libvirt works again. :) Darn I read my email in the wrong order. I am glad to hear that my changes were enough to fix libvirt-lxc. I will aim at pushing this to Linus after the conference is over and I can trust myself to think clearly. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Christoph Hellwig writes: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger writes: >> >> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman >> > wrote: >> > >> > This commit breaks libvirt-lxc. >> > libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. The biggies ability for an unprivileged user to clear nosuid, nodev, read-only, noexec with remount. Apparently part of what libvirt-lxc clearing all mount flags except for read-only on some filesystem it remounts read-only. Which if root mounted the filesystem with nosuid, nodev, or noexec is not supportable. So if it isn't the implicit setting of nodev on mount but not on remount causing problems (which I have the trivial fix for) there is a strong chance I need to break libvirt-lxc. I am several bugs deep already where fixing one bug reveals yet another bug. It is possible there is something else that that I have not discovered that I am missing. I am more to happy to investigate to see if it possible to avoid breaking libvirt-lxc. Apologies for being a little vague being at the conference I only have access to email about an hour a day. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 15:12, schrieb Christoph Hellwig: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger writes: >> >>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman >>> wrote: >>> >>> This commit breaks libvirt-lxc. >>> libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: > Richard Weinberger writes: > > > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman > > wrote: > > > > This commit breaks libvirt-lxc. > > libvirt does in lxcContainerMountBasicFS(): > > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 08:29, schrieb Richard Weinberger: > Am 21.08.2014 06:53, schrieb Eric W. Biederman: >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. >> >> Could you please look at my user-namespace.git#for-next branch I have a >> fix for at least one regresion causing issue in there. I think it may >> fix your issues but I am not fully certain more comments below. > > I'll run this on my LXC testbed today. Looks good. With these patches applied libvirt works again. :) Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 06:53, schrieb Eric W. Biederman: > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. > > Could you please look at my user-namespace.git#for-next branch I have a > fix for at least one regresion causing issue in there. I think it may > fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. >> /* >> * We can't immediately set the MS_RDONLY flag when mounting >> filesystems >> * because (in at least some kernel versions) this will propagate >> back >> * to the original mount in the host OS, turning it readonly too. >> Thus >> * we mount the filesystem in read-write mode initially, and then do >> a >> * separate read-only bind mount on top of that. >> */ >> bindOverReadonly = !!(mnt_mflags & MS_RDONLY); >> >> VIR_DEBUG("Mount %s on %s type=%s flags=%x", >> mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); >> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & >> ~MS_RDONLY, NULL) < 0) { >> >> Here it fails for sysfs because with user namespaces we bind the >> existing /sys into the container >> and would have to read out all existing mount flags from the current /sys >> mount. >> Otherwise mount() fails with EPERM. >> On my test system /sys is mounted with >> "rw,nosuid,nodev,noexec,relatime" and libvirt >> misses the realtime... > > Not specifying any atime flags to mount should be safe as that asks for > the default atime flags which for remount I have made the default atime > flags the existing atime flags. So I am scratching my head a little on > this one. Okay, let me find out why exactly libvirt gets a EPERM here. Maybe there are more odds hidden. >> >> virReportSystemError(errno, >> _("Failed to mount %s on %s type %s >> flags=%x"), >> mnt_src, mnt->dst, NULLSTR(mnt->type), >> mnt_mflags & ~MS_RDONLY); >> goto cleanup; >> } >> >> if (bindOverReadonly && >> mount(mnt_src, mnt->dst, NULL, >> MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { >> >> ^^^ Here it fails because now we'd have to specify all flags as used >> for the first >> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. >> See lxcBasicMounts[]. >> In this case the fix is easy, add mnt_mflags to the mount flags. > > That has always been a bug in general because remount has always > required specifying the complete set of mount flags you want to have. > > That fact that flags such as nosuid are now properly locked so you can > not change them if you are not the global root user just makes this > obvious. > > Andy Lutermorski has observed that statvfs will return the mount flags > making reading them simple. Thanks for the clarification, I'll create a fix for libvirt. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 06:53, schrieb Eric W. Biederman: The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags MS_RDONLY); VIR_DEBUG(Mount %s on %s type=%s flags=%x, mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY); if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY, NULL) 0) { Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with rw,nosuid,nodev,noexec,relatime and libvirt misses the realtime... Not specifying any atime flags to mount should be safe as that asks for the default atime flags which for remount I have made the default atime flags the existing atime flags. So I am scratching my head a little on this one. Okay, let me find out why exactly libvirt gets a EPERM here. Maybe there are more odds hidden. virReportSystemError(errno, _(Failed to mount %s on %s type %s flags=%x), mnt_src, mnt-dst, NULLSTR(mnt-type), mnt_mflags ~MS_RDONLY); goto cleanup; } if (bindOverReadonly mount(mnt_src, mnt-dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. That has always been a bug in general because remount has always required specifying the complete set of mount flags you want to have. That fact that flags such as nosuid are now properly locked so you can not change them if you are not the global root user just makes this obvious. Andy Lutermorski has observed that statvfs will return the mount flags making reading them simple. Thanks for the clarification, I'll create a fix for libvirt. Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 08:29, schrieb Richard Weinberger: Am 21.08.2014 06:53, schrieb Eric W. Biederman: The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. Looks good. With these patches applied libvirt works again. :) Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Am 21.08.2014 15:12, schrieb Christoph Hellwig: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Christoph Hellwig h...@infradead.org writes: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. The biggies ability for an unprivileged user to clear nosuid, nodev, read-only, noexec with remount. Apparently part of what libvirt-lxc clearing all mount flags except for read-only on some filesystem it remounts read-only. Which if root mounted the filesystem with nosuid, nodev, or noexec is not supportable. So if it isn't the implicit setting of nodev on mount but not on remount causing problems (which I have the trivial fix for) there is a strong chance I need to break libvirt-lxc. I am several bugs deep already where fixing one bug reveals yet another bug. It is possible there is something else that that I have not discovered that I am missing. I am more to happy to investigate to see if it possible to avoid breaking libvirt-lxc. Apologies for being a little vague being at the conference I only have access to email about an hour a day. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger rich...@nod.at writes: Am 21.08.2014 08:29, schrieb Richard Weinberger: Am 21.08.2014 06:53, schrieb Eric W. Biederman: The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. Looks good. With these patches applied libvirt works again. :) Darn I read my email in the wrong order. I am glad to hear that my changes were enough to fix libvirt-lxc. I will aim at pushing this to Linus after the conference is over and I can trust myself to think clearly. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger rich...@nod.at writes: Am 21.08.2014 15:12, schrieb Christoph Hellwig: On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger writes: > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman > wrote: > > This commit breaks libvirt-lxc. > libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. > /* > * We can't immediately set the MS_RDONLY flag when mounting > filesystems > * because (in at least some kernel versions) this will propagate back > * to the original mount in the host OS, turning it readonly too. Thus > * we mount the filesystem in read-write mode initially, and then do a > * separate read-only bind mount on top of that. > */ > bindOverReadonly = !!(mnt_mflags & MS_RDONLY); > > VIR_DEBUG("Mount %s on %s type=%s flags=%x", > mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); > if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & > ~MS_RDONLY, NULL) < 0) { > > Here it fails for sysfs because with user namespaces we bind the > existing /sys into the container > and would have to read out all existing mount flags from the current /sys > mount. > Otherwise mount() fails with EPERM. > On my test system /sys is mounted with > "rw,nosuid,nodev,noexec,relatime" and libvirt > misses the realtime... Not specifying any atime flags to mount should be safe as that asks for the default atime flags which for remount I have made the default atime flags the existing atime flags. So I am scratching my head a little on this one. > > virReportSystemError(errno, > _("Failed to mount %s on %s type %s > flags=%x"), > mnt_src, mnt->dst, NULLSTR(mnt->type), > mnt_mflags & ~MS_RDONLY); > goto cleanup; > } > > if (bindOverReadonly && > mount(mnt_src, mnt->dst, NULL, > MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { > > ^^^ Here it fails because now we'd have to specify all flags as used > for the first > mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. > See lxcBasicMounts[]. > In this case the fix is easy, add mnt_mflags to the mount flags. That has always been a bug in general because remount has always required specifying the complete set of mount flags you want to have. That fact that flags such as nosuid are now properly locked so you can not change them if you are not the global root user just makes this obvious. Andy Lutermorski has observed that statvfs will return the mount flags making reading them simple. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > >git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git > for-linus > >HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at > /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > > The next change fixes a bug in NFS that was discovered while auditing > nsproxy users for the first optimization. Today you can oops the kernel > by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid > namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case > yesterday when a build bot caught my typo. Given that no one to my > knowledge bases anything on my tree fixing the typo in place seems more > responsible that requiring a typo-fix to be backported as well. > > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. > > Eric > > Eric W. Biederman (11): > namespaces: Use task_lock and not rcu to protect nsproxy > mnt: Only change user settable mount flags in remount > mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into > do_remount > mnt: Correct permission checks in do_remount This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags & MS_RDONLY); VIR_DEBUG("Mount %s on %s type=%s flags=%x", mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY, NULL) < 0) { Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with "rw,nosuid,nodev,noexec,relatime" and libvirt misses the realtime... virReportSystemError(errno, _("Failed to mount %s on %s type %s flags=%x"), mnt_src, mnt->dst, NULLSTR(mnt->type), mnt_mflags & ~MS_RDONLY); goto cleanup; } if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst, MS_BIND|MS_REMOUNT|MS_RDONLY); goto cleanup; } -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that mount -o remount,.. will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. Eric Eric W. Biederman (11): namespaces: Use task_lock and not rcu to protect nsproxy mnt: Only change user settable mount flags in remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Correct permission checks in do_remount This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags MS_RDONLY); VIR_DEBUG(Mount %s on %s type=%s flags=%x, mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY); if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY, NULL) 0) { Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with rw,nosuid,nodev,noexec,relatime and libvirt misses the realtime... virReportSystemError(errno, _(Failed to mount %s on %s type %s flags=%x), mnt_src, mnt-dst, NULLSTR(mnt-type), mnt_mflags ~MS_RDONLY); goto cleanup; } if (bindOverReadonly mount(mnt_src, mnt-dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. virReportSystemError(errno, _(Failed to re-mount %s on %s flags=%x), mnt_src, mnt-dst, MS_BIND|MS_REMOUNT|MS_RDONLY); goto cleanup; } -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Richard Weinberger richard.weinber...@gmail.com writes: On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman ebied...@xmission.com wrote: This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags MS_RDONLY); VIR_DEBUG(Mount %s on %s type=%s flags=%x, mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY); if (mount(mnt_src, mnt-dst, mnt-type, mnt_mflags ~MS_RDONLY, NULL) 0) { Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with rw,nosuid,nodev,noexec,relatime and libvirt misses the realtime... Not specifying any atime flags to mount should be safe as that asks for the default atime flags which for remount I have made the default atime flags the existing atime flags. So I am scratching my head a little on this one. virReportSystemError(errno, _(Failed to mount %s on %s type %s flags=%x), mnt_src, mnt-dst, NULLSTR(mnt-type), mnt_mflags ~MS_RDONLY); goto cleanup; } if (bindOverReadonly mount(mnt_src, mnt-dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. That has always been a bug in general because remount has always required specifying the complete set of mount flags you want to have. That fact that flags such as nosuid are now properly locked so you can not change them if you are not the global root user just makes this obvious. Andy Lutermorski has observed that statvfs will return the mount flags making reading them simple. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman wrote: > Kenton Varda writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman >> wrote: > From: "Eric W. Biederman" > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. Tested-but-disliked-by: Andy Lutomirski Quoting Alan Cox [1]: Lying to apps generally ends up like children lying to parents - the lie gets more complicated to keep up each case you find until it breaks. The kernel was unnecessarily fudging mount flags, and it broke, because the fudge didn't go far enough. This extends the fudge. Let's just delete it instead. --Andy [1] https://lwn.net/Articles/608435/ > > Cc: sta...@vger.kernel.org > Signed-off-by: "Eric W. Biederman" > --- > fs/namespace.c | 9 > .../selftests/mount/unprivileged-remount-test.c| 51 > -- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, > int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > +* outside the initial user namespace. > +*/ > + if ((mnt->mnt_ns->user_ns != _user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various > diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c > b/tools/testing/selftests/mount/unprivileged-remount-test.c > index 1b3ff2fda4d0..6d408c155e0f 100644 > --- a/tools/testing/selftests/mount/unprivileged-remount-test.c > +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c > @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) > } > > static > -bool test_unpriv_remount(int mount_flags, int remount_flags, int > invalid_flags) > +bool test_unpriv_remount(const char *fstype, const char *mount_options, > +int mount_flags, int remount_flags, int > invalid_flags) > { > pid_t child; > > @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int > remount_flags, int invalid_flags) > strerror(errno)); > } > > - if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) { > - die("mount of /tmp failed: %s\n", > - strerror(errno)); > + if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != > 0) { > + die("mount of %s with options '%s' on /tmp failed: %s\n", > + fstype, > + mount_options? mount_options : "", > + strerror(errno)); > } > > create_and_enter_userns(); > @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int > remount_flags, int invalid_flags) > > static bool test_unpriv_remount_simple(int mount_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, 0); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, > 0); > } > > static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, > + invalid_flags); > } > > int main(int argc, char **argv) > { > - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_RDONLY)) { > die("MS_RDONLY malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NODEV)) { > + if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, > 0)) { > die("MS_NODEV malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOSUID)) { > die("MS_NOSUID malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOEXEC)) { > die("MS_NOEXEC malfunctions\n"); > } > - if
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman ebied...@xmission.com wrote: Kenton Varda ken...@sandstorm.io writes: On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman ebied...@xmission.com wrote: From: Eric W. Biederman ebied...@xmission.com Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Tested-but-disliked-by: Andy Lutomirski l...@amacapital.net Quoting Alan Cox [1]: Lying to apps generally ends up like children lying to parents - the lie gets more complicated to keep up each case you find until it breaks. The kernel was unnecessarily fudging mount flags, and it broke, because the fudge didn't go far enough. This extends the fudge. Let's just delete it instead. --Andy [1] https://lwn.net/Articles/608435/ Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- fs/namespace.c | 9 .../selftests/mount/unprivileged-remount-test.c| 51 -- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path-dentry != path-mnt-mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created +* outside the initial user namespace. +*/ + if ((mnt-mnt_ns-user_ns != init_user_ns) + !(sb-s_type-fs_flags FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 1b3ff2fda4d0..6d408c155e0f 100644 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) } static -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) +bool test_unpriv_remount(const char *fstype, const char *mount_options, +int mount_flags, int remount_flags, int invalid_flags) { pid_t child; @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) strerror(errno)); } - if (mount(testing, /tmp, ramfs, mount_flags, NULL) != 0) { - die(mount of /tmp failed: %s\n, - strerror(errno)); + if (mount(testing, /tmp, fstype, mount_flags, mount_options) != 0) { + die(mount of %s with options '%s' on /tmp failed: %s\n, + fstype, + mount_options? mount_options : , + strerror(errno)); } create_and_enter_userns(); @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) static bool test_unpriv_remount_simple(int mount_flags) { - return test_unpriv_remount(mount_flags, mount_flags, 0); + return test_unpriv_remount(ramfs, NULL, mount_flags, mount_flags, 0); } static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) { - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); + return test_unpriv_remount(ramfs, NULL, mount_flags, mount_flags, + invalid_flags); } int main(int argc, char **argv) { - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_RDONLY)) { die(MS_RDONLY malfunctions\n); } - if (!test_unpriv_remount_simple(MS_NODEV)) { + if (!test_unpriv_remount(devpts, newinstance, MS_NODEV, MS_NODEV, 0)) { die(MS_NODEV malfunctions\n); } - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOSUID)) { die(MS_NOSUID malfunctions\n); } - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOEXEC)) { die(MS_NOEXEC malfunctions\n); } - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV, -
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman wrote: > Kenton Varda writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > I have a proposed fix as the end of this email. Could you please > test it? > > If I can get a Tested-by and possibly a Reviewed-by responses I would > appreciate it. Will test later today. > >> The problem is that users like us had no idea that nodev was being >> silently added in the first place, and thus didn't know that we needed >> to specify it in remounts. > > Agreed. And frankly that is extremely reasonable and it would in fact > break anything doing a traditional tracking of mounts with /etc/mtab. > So it is a pretty siginificant discontinuity. > > I am not prepared to remove the the implicit adding of nodev from the > implementation as that would just break your code and probably others > code in a different way. I'm having trouble figuring out what would break. The only way to end up with a device node on one of these filesystems would be to pass an fd to the file system out of the userns and have some globally privileged code put the device node there. > > What we can do and that will work cleanly long term is make remount > match mount and implicitly add nodev. > >> We create the tmpfs, put some things in it, >> and then want to remount it read-only for the sandbox. It seems >> reasonable to expect that a newly-created tmpfs would have exactly the >> flags I gave it when I created it, not silently get an additional flag >> that I then need to pass on remount. > > A reasonable expectation yes. At the same time it is also a very > reasonable security requirement to not allow devices on filesystems > that the global root does not control. But the magic nodev isn't needed for that; the checks in mknod are the right place for this -- they're sufficient, and they're required anyway. > > Compared to Andy's suggestion of relaxing the nodev restriction this > will also work for filesystems like fuse and nfs when we allow begin to > allow unprivileged mounts of those filesystems, and it allows to sleep > better knowing the code is extremely paranoid. If we allow fuse and nfs, we need to deal with nosuid, too. I want to avoid using nosuid for this; I want to change the suid logic to check the namespace, since I think that suid *should* work in a namespace. Also, that implicit nodev may be a problem down the road if we ever add device namespaces. ... > > From: "Eric W. Biederman" > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. > > Cc: sta...@vger.kernel.org > Signed-off-by: "Eric W. Biederman" > --- > fs/namespace.c | 9 > .../selftests/mount/unprivileged-remount-test.c| 51 > -- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, > int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > +* outside the initial user namespace. I don't like this comment at all. If we stick with this approach, I think that this should say: Certain mounts are forced to have MNT_NODEV set. For these mounts, user code might not realize that nodev is set, so force MS_NODEV on remount attempts to prevent confusing remount failures. > +*/ > + if ((mnt->mnt_ns->user_ns != _user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various I will still advocate a different and much less magical approach. I'll send a patch later today. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Kenton Varda writes: > On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > wrote: > > If you can find a userspace application that matters I might care > > that a security fix breaks it. > > FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not > sure if you'd say that we "matter". I should meant to have said: "If you can find a userspace application that breaks I might care that a security fix breaks it" Hearing this actually break something changes this from a theoretical discussion to something real. I have a proposed fix as the end of this email. Could you please test it? If I can get a Tested-by and possibly a Reviewed-by responses I would appreciate it. > The problem is that users like us had no idea that nodev was being > silently added in the first place, and thus didn't know that we needed > to specify it in remounts. Agreed. And frankly that is extremely reasonable and it would in fact break anything doing a traditional tracking of mounts with /etc/mtab. So it is a pretty siginificant discontinuity. I am not prepared to remove the the implicit adding of nodev from the implementation as that would just break your code and probably others code in a different way. What we can do and that will work cleanly long term is make remount match mount and implicitly add nodev. > We create the tmpfs, put some things in it, > and then want to remount it read-only for the sandbox. It seems > reasonable to expect that a newly-created tmpfs would have exactly the > flags I gave it when I created it, not silently get an additional flag > that I then need to pass on remount. A reasonable expectation yes. At the same time it is also a very reasonable security requirement to not allow devices on filesystems that the global root does not control. It is also reasonable that code that does not care should not have to worry about this issue and work the same as the global root or as a container root. Which I think is the reasoning that I went with when adding the implicit nodev. > Note further that it may be very hard for normal developers to figure > out why their remount is failing in this case. Andy only discovered > the silent nodev by reading the kernel code. Agreed. So the simple solution I see with minimal danger and minimal security risk is to remove the inconsitency between mount and remount and add implicitly add MNT_NODEV during remount as well. That removes the minor regression and keeps the code in the paranoid state where device nodes are not honored and not allowed on filesystems mounted by ordinary users. Compared to Andy's suggestion of relaxing the nodev restriction this will also work for filesystems like fuse and nfs when we allow begin to allow unprivileged mounts of those filesystems, and it allows to sleep better knowing the code is extremely paranoid. This reminds me that the dust is settling and the man pages need to be updated for all of these small tweaks to the API that have happened, so there is a good source of documentation for how all of this actually works and why. I am going to be travelling pretty much the rest of the week, so won't be particularly responsive. But if you can get this tested I think this counts as a minor regression fix that solves this issue. Eric From: "Eric W. Biederman" Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Cc: sta...@vger.kernel.org Signed-off-by: "Eric W. Biederman" --- fs/namespace.c | 9 .../selftests/mount/unprivileged-remount-test.c| 51 -- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path->dentry != path->mnt->mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created +* outside the initial user namespace. +*/ + if ((mnt->mnt_ns->user_ns != _user_ns) && + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 1b3ff2fda4d0..6d408c155e0f
Re: [GIT PULL] namespace updates for v3.17-rc1
Kenton Varda ken...@sandstorm.io writes: On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman ebied...@xmission.com wrote: If you can find a userspace application that matters I might care that a security fix breaks it. FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not sure if you'd say that we matter. I should meant to have said: If you can find a userspace application that breaks I might care that a security fix breaks it Hearing this actually break something changes this from a theoretical discussion to something real. I have a proposed fix as the end of this email. Could you please test it? If I can get a Tested-by and possibly a Reviewed-by responses I would appreciate it. The problem is that users like us had no idea that nodev was being silently added in the first place, and thus didn't know that we needed to specify it in remounts. Agreed. And frankly that is extremely reasonable and it would in fact break anything doing a traditional tracking of mounts with /etc/mtab. So it is a pretty siginificant discontinuity. I am not prepared to remove the the implicit adding of nodev from the implementation as that would just break your code and probably others code in a different way. What we can do and that will work cleanly long term is make remount match mount and implicitly add nodev. We create the tmpfs, put some things in it, and then want to remount it read-only for the sandbox. It seems reasonable to expect that a newly-created tmpfs would have exactly the flags I gave it when I created it, not silently get an additional flag that I then need to pass on remount. A reasonable expectation yes. At the same time it is also a very reasonable security requirement to not allow devices on filesystems that the global root does not control. It is also reasonable that code that does not care should not have to worry about this issue and work the same as the global root or as a container root. Which I think is the reasoning that I went with when adding the implicit nodev. Note further that it may be very hard for normal developers to figure out why their remount is failing in this case. Andy only discovered the silent nodev by reading the kernel code. Agreed. So the simple solution I see with minimal danger and minimal security risk is to remove the inconsitency between mount and remount and add implicitly add MNT_NODEV during remount as well. That removes the minor regression and keeps the code in the paranoid state where device nodes are not honored and not allowed on filesystems mounted by ordinary users. Compared to Andy's suggestion of relaxing the nodev restriction this will also work for filesystems like fuse and nfs when we allow begin to allow unprivileged mounts of those filesystems, and it allows to sleep better knowing the code is extremely paranoid. This reminds me that the dust is settling and the man pages need to be updated for all of these small tweaks to the API that have happened, so there is a good source of documentation for how all of this actually works and why. I am going to be travelling pretty much the rest of the week, so won't be particularly responsive. But if you can get this tested I think this counts as a minor regression fix that solves this issue. Eric From: Eric W. Biederman ebied...@xmission.com Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- fs/namespace.c | 9 .../selftests/mount/unprivileged-remount-test.c| 51 -- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path-dentry != path-mnt-mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created +* outside the initial user namespace. +*/ + if ((mnt-mnt_ns-user_ns != init_user_ns) + !(sb-s_type-fs_flags FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman ebied...@xmission.com wrote: Kenton Varda ken...@sandstorm.io writes: On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman I have a proposed fix as the end of this email. Could you please test it? If I can get a Tested-by and possibly a Reviewed-by responses I would appreciate it. Will test later today. The problem is that users like us had no idea that nodev was being silently added in the first place, and thus didn't know that we needed to specify it in remounts. Agreed. And frankly that is extremely reasonable and it would in fact break anything doing a traditional tracking of mounts with /etc/mtab. So it is a pretty siginificant discontinuity. I am not prepared to remove the the implicit adding of nodev from the implementation as that would just break your code and probably others code in a different way. I'm having trouble figuring out what would break. The only way to end up with a device node on one of these filesystems would be to pass an fd to the file system out of the userns and have some globally privileged code put the device node there. What we can do and that will work cleanly long term is make remount match mount and implicitly add nodev. We create the tmpfs, put some things in it, and then want to remount it read-only for the sandbox. It seems reasonable to expect that a newly-created tmpfs would have exactly the flags I gave it when I created it, not silently get an additional flag that I then need to pass on remount. A reasonable expectation yes. At the same time it is also a very reasonable security requirement to not allow devices on filesystems that the global root does not control. But the magic nodev isn't needed for that; the checks in mknod are the right place for this -- they're sufficient, and they're required anyway. Compared to Andy's suggestion of relaxing the nodev restriction this will also work for filesystems like fuse and nfs when we allow begin to allow unprivileged mounts of those filesystems, and it allows to sleep better knowing the code is extremely paranoid. If we allow fuse and nfs, we need to deal with nosuid, too. I want to avoid using nosuid for this; I want to change the suid logic to check the namespace, since I think that suid *should* work in a namespace. Also, that implicit nodev may be a problem down the road if we ever add device namespaces. ... From: Eric W. Biederman ebied...@xmission.com Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Cc: sta...@vger.kernel.org Signed-off-by: Eric W. Biederman ebied...@xmission.com --- fs/namespace.c | 9 .../selftests/mount/unprivileged-remount-test.c| 51 -- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path-dentry != path-mnt-mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created +* outside the initial user namespace. I don't like this comment at all. If we stick with this approach, I think that this should say: Certain mounts are forced to have MNT_NODEV set. For these mounts, user code might not realize that nodev is set, so force MS_NODEV on remount attempts to prevent confusing remount failures. +*/ + if ((mnt-mnt_ns-user_ns != init_user_ns) + !(sb-s_type-fs_flags FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various I will still advocate a different and much less magical approach. I'll send a patch later today. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman wrote: > Andy Lutomirski writes: > >> On 08/05/2014 05:57 PM, Eric W. Biederman wrote: >> >> Sorry for catching this late. I think this fix is likely to >> unnecessarily break valid userspace due to an odd interaction. > > The code is correct and safe (no security issues), but yes a blind > remount might hit a snag. > > If you can find a userspace application that matters I might care > that a security fix breaks it. > > I think you have made a point that several more filesystems might > be ok to not set nodev on (because we can not do anything to create > device nodes on those filesystems). I personally would prefer the much > more paranoid approach of only allowing device nodes on a unprivileged > mount if we have audited all of the code paths and know it is safe > for device nodes to appear there. > > I don't actually think anyone cares ad remounts of filesystems like > tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts > are even rare. The normal userspace utilities look at the appropriate > version of /proc/mounts on remount. Bind remounts are the only kind of remounts, because we've never supported do_remount_sb in a user namespace. So, if you want to create some static content in your user namespace, the way to do it is: unshare(CLONE_NEWUSER | CLONE_NEWNS); mount tmpfs somewhere; write to the tmpfs; mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND | MS_RDONLY); > > These are not filesystems that a blind remount will likely be applied > to. > > Furthermore there is work underway to prepare patches to allow > "mount --bind -ro" to work as expected. That will further reduce > the pressure from blind remounts. Not for example above. It really does need the remount. > > If there is an actual regression of actual code I am happy to deal > with it. But having the MNT_NODEV on those mounts has been the case > for a long time now and is not new (no regression). This change just > closed the security hole that allowed nodev to be removed. And that > security hole we need to have fixed. Sandstorm does this. (Well, it did until today.) --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Andy Lutomirski writes: > On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Sorry for catching this late. I think this fix is likely to > unnecessarily break valid userspace due to an odd interaction. The code is correct and safe (no security issues), but yes a blind remount might hit a snag. If you can find a userspace application that matters I might care that a security fix breaks it. I think you have made a point that several more filesystems might be ok to not set nodev on (because we can not do anything to create device nodes on those filesystems). I personally would prefer the much more paranoid approach of only allowing device nodes on a unprivileged mount if we have audited all of the code paths and know it is safe for device nodes to appear there. I don't actually think anyone cares ad remounts of filesystems like tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts are even rare. The normal userspace utilities look at the appropriate version of /proc/mounts on remount. These are not filesystems that a blind remount will likely be applied to. Furthermore there is work underway to prepare patches to allow "mount --bind -ro" to work as expected. That will further reduce the pressure from blind remounts. If there is an actual regression of actual code I am happy to deal with it. But having the MNT_NODEV on those mounts has been the case for a long time now and is not new (no regression). This change just closed the security hole that allowed nodev to be removed. And that security hole we need to have fixed. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > >git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git > for-linus > >HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at > /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. do_new_mount contains this: if (user_ns != _user_ns) { if (!(type->fs_flags & FS_USERNS_MOUNT)) { put_filesystem(type); return -EPERM; } /* Only in special cases allow devices from mounts * created outside the initial user namespace. */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { flags |= MS_NODEV; mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; } } This means that private tmpfs mounts end up with MNT_NODEV | MNT_LOCK_NODEV. Certainly the change from MNT_NODEV to MNT_LOCK_NODEV is sensible *if MNT_NODEV made sense in the first place*. But we didn't have MNT_LOCK_NODEV in the past, so this well-intentioned code never really worked. This has a very strange effect: mounting a private tmpfs with all default flags and then remounting with MS_REMOUNT | MS_BIND *without MS_NODEV* will now fail. I suspect that this practice is rather common, since most likely no one ever paid attention to that implicit MNT_NODEV before. I would argue that the correct fix is to either remove the implicit MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems. The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and devpts. devpts already sets FS_USERNS_DEV_MOUNT. Setting FS_USERNS_DEV_MOUNT should be safe on all of the others in this list -- I think that the only one that initially contains device nodes is sysfs on selinux systems, which contains a null node. I think the point of this is to prevent mounts of filesystems with user-controlled initial contents from being dangerous. But we don't have any of those yet. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On 08/05/2014 05:57 PM, Eric W. Biederman wrote: Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that mount -o remount,.. will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. do_new_mount contains this: if (user_ns != init_user_ns) { if (!(type-fs_flags FS_USERNS_MOUNT)) { put_filesystem(type); return -EPERM; } /* Only in special cases allow devices from mounts * created outside the initial user namespace. */ if (!(type-fs_flags FS_USERNS_DEV_MOUNT)) { flags |= MS_NODEV; mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; } } This means that private tmpfs mounts end up with MNT_NODEV | MNT_LOCK_NODEV. Certainly the change from MNT_NODEV to MNT_LOCK_NODEV is sensible *if MNT_NODEV made sense in the first place*. But we didn't have MNT_LOCK_NODEV in the past, so this well-intentioned code never really worked. This has a very strange effect: mounting a private tmpfs with all default flags and then remounting with MS_REMOUNT | MS_BIND *without MS_NODEV* will now fail. I suspect that this practice is rather common, since most likely no one ever paid attention to that implicit MNT_NODEV before. I would argue that the correct fix is to either remove the implicit MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems. The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and devpts. devpts already sets FS_USERNS_DEV_MOUNT. Setting FS_USERNS_DEV_MOUNT should be safe on all of the others in this list -- I think that the only one that initially contains device nodes is sysfs on selinux systems, which contains a null node. I think the point of this is to prevent mounts of filesystems with user-controlled initial contents from being dangerous. But we don't have any of those yet. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Andy Lutomirski l...@amacapital.net writes: On 08/05/2014 05:57 PM, Eric W. Biederman wrote: Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. The code is correct and safe (no security issues), but yes a blind remount might hit a snag. If you can find a userspace application that matters I might care that a security fix breaks it. I think you have made a point that several more filesystems might be ok to not set nodev on (because we can not do anything to create device nodes on those filesystems). I personally would prefer the much more paranoid approach of only allowing device nodes on a unprivileged mount if we have audited all of the code paths and know it is safe for device nodes to appear there. I don't actually think anyone cares ad remounts of filesystems like tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts are even rare. The normal userspace utilities look at the appropriate version of /proc/mounts on remount. These are not filesystems that a blind remount will likely be applied to. Furthermore there is work underway to prepare patches to allow mount --bind -ro to work as expected. That will further reduce the pressure from blind remounts. If there is an actual regression of actual code I am happy to deal with it. But having the MNT_NODEV on those mounts has been the case for a long time now and is not new (no regression). This change just closed the security hole that allowed nodev to be removed. And that security hole we need to have fixed. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman ebied...@xmission.com wrote: Andy Lutomirski l...@amacapital.net writes: On 08/05/2014 05:57 PM, Eric W. Biederman wrote: Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. The code is correct and safe (no security issues), but yes a blind remount might hit a snag. If you can find a userspace application that matters I might care that a security fix breaks it. I think you have made a point that several more filesystems might be ok to not set nodev on (because we can not do anything to create device nodes on those filesystems). I personally would prefer the much more paranoid approach of only allowing device nodes on a unprivileged mount if we have audited all of the code paths and know it is safe for device nodes to appear there. I don't actually think anyone cares ad remounts of filesystems like tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts are even rare. The normal userspace utilities look at the appropriate version of /proc/mounts on remount. Bind remounts are the only kind of remounts, because we've never supported do_remount_sb in a user namespace. So, if you want to create some static content in your user namespace, the way to do it is: unshare(CLONE_NEWUSER | CLONE_NEWNS); mount tmpfs somewhere; write to the tmpfs; mount(path to tmpfs, path to tmpfs, nullptr, MS_REMOUNT | MS_BIND | MS_RDONLY); These are not filesystems that a blind remount will likely be applied to. Furthermore there is work underway to prepare patches to allow mount --bind -ro to work as expected. That will further reduce the pressure from blind remounts. Not for example above. It really does need the remount. If there is an actual regression of actual code I am happy to deal with it. But having the MNT_NODEV on those mounts has been the case for a long time now and is not new (no regression). This change just closed the security hole that allowed nodev to be removed. And that security hole we need to have fixed. Sandstorm does this. (Well, it did until today.) --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote: > > Yes, that is what confused my, sorry, since that commit and the ones > following are new commits but unchanged patches. As a suggestion, in addition to trying to count the number of commits that are new, or with the same patch id's, etc., is to provide the diff stats between the last version of the branch that landed in linux-next before the merge window opened, and the branch head that was requested for Linus to pull? This will get tripped up by those poeple who send multiple pull requests which are subsets of the branch that is tracked by linux-next (for example, to arrange the order of merging to minimize conflicts), so like all things, it's not perfect, but it might be useful additional indicator, at least in the early days of the merge window. Also, if you could report the commit id of the branch head before the merge window opened, then other people (including Linus) can do their own investigation and draw their own conclusions. Cheers, - Ted P.S. If there was some way this could be automated so that a 'bot sends out a report for every single pull request, then it minimizes the "J'accuse!" aspect, since the report is being done for all pull requests, and everyone knows that robots makes mistakes. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote: Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. As a suggestion, in addition to trying to count the number of commits that are new, or with the same patch id's, etc., is to provide the diff stats between the last version of the branch that landed in linux-next before the merge window opened, and the branch head that was requested for Linus to pull? This will get tripped up by those poeple who send multiple pull requests which are subsets of the branch that is tracked by linux-next (for example, to arrange the order of merging to minimize conflicts), so like all things, it's not perfect, but it might be useful additional indicator, at least in the early days of the merge window. Also, if you could report the commit id of the branch head before the merge window opened, then other people (including Linus) can do their own investigation and draw their own conclusions. Cheers, - Ted P.S. If there was some way this could be automated so that a 'bot sends out a report for every single pull request, then it minimizes the J'accuse! aspect, since the report is being done for all pull requests, and everyone knows that robots makes mistakes. :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Stephen Rothwell writes: > Hi Eric, > >> As for missing cool tags shrug. The people looking at my code didn't >> feel like saying the magic words so I didn't include cool tags. > > Maybe you should push them ... these tags are not just "cool", they > give less involved people some indications of what has happened in the > life of a patch. Etiquette is a good idea where it helps things flow smoothly. Other times like this standing on etiquette would have resulted in trivial fixes for long standing issues not going anywhere. Althogh I will keep it in mind in the future. >> Which is a my long winded way of say it sounds like you are accusing me >> of being irresponsible, and my way of saying that I have tested and > > I did not mean to accuse, there are reasons (as you have pointed out) > that things get added late, or rewritten. I really try to avoid those reasons but I missed it this round and the buildbot did not manage to find the comibination with a build error until the code hit linux-next :( Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Hi Eric, On Tue, 05 Aug 2014 22:16:06 -0700 ebied...@xmission.com (Eric W. Biederman) wrote: > > I am not certain what your point is. I am just trying to give Linus a heads up for branches that have not had much exposure before he is asked to pull them. > There have been no commits added since the merge window opened. > > There was one commit changed to fix a typo. I documented that already. Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. > There were some commits pushed to the tree as late as friday that had > been out for review earlier than that and it is possible that you did > not pick them up in linux-next until monday. That doesn't mean I added > anything after the merge window opened. Correct. > I have also made certain all of these commits have at least had a chance > to show up in linux-next. > > As for missing cool tags shrug. The people looking at my code didn't > feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just "cool", they give less involved people some indications of what has happened in the life of a patch. > Beyond that I have been quite out of it recently and this is what I had > time to do. If I had had a little more time and energy I would have > included unmount on unlink patches that still need magic to happen to > keep from blowing the stack in pathological cases on everything except > x86_64. That code has been sitting in linux-next. > > Which is a my long winded way of say it sounds like you are accusing me > of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [GIT PULL] namespace updates for v3.17-rc1
Hi Eric, On Tue, 05 Aug 2014 22:16:06 -0700 ebied...@xmission.com (Eric W. Biederman) wrote: I am not certain what your point is. I am just trying to give Linus a heads up for branches that have not had much exposure before he is asked to pull them. There have been no commits added since the merge window opened. There was one commit changed to fix a typo. I documented that already. Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. There were some commits pushed to the tree as late as friday that had been out for review earlier than that and it is possible that you did not pick them up in linux-next until monday. That doesn't mean I added anything after the merge window opened. Correct. I have also made certain all of these commits have at least had a chance to show up in linux-next. As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just cool, they give less involved people some indications of what has happened in the life of a patch. Beyond that I have been quite out of it recently and this is what I had time to do. If I had had a little more time and energy I would have included unmount on unlink patches that still need magic to happen to keep from blowing the stack in pathological cases on everything except x86_64. That code has been sitting in linux-next. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [GIT PULL] namespace updates for v3.17-rc1
Stephen Rothwell s...@canb.auug.org.au writes: Hi Eric, As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just cool, they give less involved people some indications of what has happened in the life of a patch. Etiquette is a good idea where it helps things flow smoothly. Other times like this standing on etiquette would have resulted in trivial fixes for long standing issues not going anywhere. Althogh I will keep it in mind in the future. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. I really try to avoid those reasons but I missed it this round and the buildbot did not manage to find the comibination with a build error until the code hit linux-next :( Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Stephen Rothwell writes: > Hi Eric, > > On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) > wrote: >> >> Please pull the for-linus branch from the git tree: >> >>git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git >> for-linus >> >>HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts >> at /proc/thread-self/mounts instead of /proc/self/mounts > > This has had 4 commits added since the merge window opened that have no > Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by > tag. I am not certain what your point is. There have been no commits added since the merge window opened. There was one commit changed to fix a typo. I documented that already. There were some commits pushed to the tree as late as friday that had been out for review earlier than that and it is possible that you did not pick them up in linux-next until monday. That doesn't mean I added anything after the merge window opened. I have also made certain all of these commits have at least had a chance to show up in linux-next. As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Furthermore the code is all quite trivial. Beyond that I have been quite out of it recently and this is what I had time to do. If I had had a little more time and energy I would have included unmount on unlink patches that still need magic to happen to keep from blowing the stack in pathological cases on everything except x86_64. That code has been sitting in linux-next. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and vetted these trivial patches and been as careful as I reasonably could, and I that I apologise for not being able to do things a few days earlier. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) wrote: > > Please pull the for-linus branch from the git tree: > >git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git > for-linus > >HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at > /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. They appear to include this last change. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
[GIT PULL] namespace updates for v3.17-rc1
Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that "mount -o remount,.." will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. Eric Eric W. Biederman (11): namespaces: Use task_lock and not rcu to protect nsproxy mnt: Only change user settable mount flags in remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Correct permission checks in do_remount mnt: Change the default remount atime from relatime to the existing value mnt: Add tests for unprivileged remount cases that have found to be faulty NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes proc: Have net show up under /proc//task/ proc: Implement /proc/thread-self to point at the directory of the current thread proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts fs/namespace.c | 65 +- fs/nfs/client.c| 95 fs/nfs/inode.c | 3 +- fs/nfs/internal.h | 9 + fs/nfs/netns.h | 3 + fs/proc/Makefile | 1 + fs/proc/base.c | 18 +- fs/proc/inode.c| 7 +- fs/proc/internal.h | 6 + fs/proc/proc_net.c | 6 +- fs/proc/root.c | 5 +- fs/proc/thread_self.c | 85 fs/proc_namespace.c| 8 +- include/linux/mount.h | 9 +- include/linux/nsproxy.h| 16 +- include/linux/pid_namespace.h | 1 + ipc/namespace.c| 6 +- kernel/nsproxy.c | 15 +- kernel/utsname.c | 6 +- net/core/net_namespace.c | 10 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/mount/Makefile | 17 ++ .../selftests/mount/unprivileged-remount-test.c| 242 + 23 files changed, 537 insertions(+), 97 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] namespace updates for v3.17-rc1
Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that mount -o remount,.. will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. Eric Eric W. Biederman (11): namespaces: Use task_lock and not rcu to protect nsproxy mnt: Only change user settable mount flags in remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Correct permission checks in do_remount mnt: Change the default remount atime from relatime to the existing value mnt: Add tests for unprivileged remount cases that have found to be faulty NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes proc: Have net show up under /proc/tgid/task/tid proc: Implement /proc/thread-self to point at the directory of the current thread proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts fs/namespace.c | 65 +- fs/nfs/client.c| 95 fs/nfs/inode.c | 3 +- fs/nfs/internal.h | 9 + fs/nfs/netns.h | 3 + fs/proc/Makefile | 1 + fs/proc/base.c | 18 +- fs/proc/inode.c| 7 +- fs/proc/internal.h | 6 + fs/proc/proc_net.c | 6 +- fs/proc/root.c | 5 +- fs/proc/thread_self.c | 85 fs/proc_namespace.c| 8 +- include/linux/mount.h | 9 +- include/linux/nsproxy.h| 16 +- include/linux/pid_namespace.h | 1 + ipc/namespace.c| 6 +- kernel/nsproxy.c | 15 +- kernel/utsname.c | 6 +- net/core/net_namespace.c | 10 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/mount/Makefile | 17 ++ .../selftests/mount/unprivileged-remount-test.c| 242 + 23 files changed, 537 insertions(+), 97 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] namespace updates for v3.17-rc1
Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) wrote: Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. They appear to include this last change. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [GIT PULL] namespace updates for v3.17-rc1
Stephen Rothwell s...@canb.auug.org.au writes: Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebied...@xmission.com (Eric W. Biederman) wrote: Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. I am not certain what your point is. There have been no commits added since the merge window opened. There was one commit changed to fix a typo. I documented that already. There were some commits pushed to the tree as late as friday that had been out for review earlier than that and it is possible that you did not pick them up in linux-next until monday. That doesn't mean I added anything after the merge window opened. I have also made certain all of these commits have at least had a chance to show up in linux-next. As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Furthermore the code is all quite trivial. Beyond that I have been quite out of it recently and this is what I had time to do. If I had had a little more time and energy I would have included unmount on unlink patches that still need magic to happen to keep from blowing the stack in pathological cases on everything except x86_64. That code has been sitting in linux-next. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and vetted these trivial patches and been as careful as I reasonably could, and I that I apologise for not being able to do things a few days earlier. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/