Re: [PATCH ] Fix capability check to allow privileged CLONE_NEWUSER from nested user namespaces
Quoting Srivatsa S. Bhat (sriva...@csail.mit.edu): > From: Srivatsa S. Bhat> > The existing patch which disallows unprivileged CLONE_NEWUSER applies > the check for CAP_SYS_ADMIN capability on the 'init_user_ns' > namespace, which is not entirely correct. Consider the following sequence: > > 1. A process with root privileges calls >clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace. > > 2. child_fn, now running in the newly created user namespace enjoys the >full set of capabilities in the new user namespace, but has lost >its capabilities in the old user namespace (init_user_ns in this >case). > > 3. child_fn now calls >clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested) >user namespace. > > Step 3 should have succeeded because child_fn has full privileges No, it should not. If the host has unprivileged_userns_clone=false, then it should require privilege against the init_user_ns to change or bypass that. Yes the userns was *created* by someone with that privilege, but likely they did so precisely so that they could run more sandboxed code. > (including CAP_SYS_ADMIN) in its user namespace, but this step fails > because the CAP_SYS_ADMIN capability is checked against init_user_ns, > as opposed to child_fn's user namespace. > > So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the > current task's user namespace. > > This also helps the userns07 testcase from LTP > (testcases/kernel/containers/userns/userns07.c) to pass when running > with root privileges. If you want to run that test (which checks for >=32 level nesting depth) with privilege, surely you can just set unprivileged_userns_clone=true? > Signed-off-by: Srivatsa S. Bhat > --- > Applies on debian kernel's master branch. > > ...low-unprivileged-CLONE_NEWUSER-by-default.patch | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git > a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > > b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > index 55edbc7..fc978ea 100644 > --- > a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > +++ > b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch > @@ -12,6 +12,9 @@ issues are found, we have a fail-safe. > > Signed-off-by: Serge Hallyn > [bwh: Remove unneeded binary sysctl bits] > +[Srivatsa: Fix capability checks when running nested user namespaces > +by using ns_capable() on the current task's user namespace.] > +Signed-off-by: Srivatsa S. Bhat > --- > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn > > /* >* Minimum number of threads to boot the kernel > -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru > +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct > *copy_process( > if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == > (CLONE_NEWUSER|CLONE_FS)) > return ERR_PTR(-EINVAL); > > +if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) > -+if (!capable(CAP_SYS_ADMIN)) > ++if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) > +return ERR_PTR(-EPERM); > + > /* >* Thread groups must share signals as well, and detached threads >* can only be started up within the thread group. > -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, > +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) > if (unshare_flags & CLONE_NEWNS) > unshare_flags |= CLONE_FS; > > +if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) { > +err = -EPERM; > -+if (!capable(CAP_SYS_ADMIN)) > ++if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) > +goto bad_unshare_out; > +} > +
Bug#603944: Updated patch
Here is a patch (against the ubuntu package, just as example) which instead of doing a dumb retry loop, waits for udev. === modified file 'debian/changelog' --- debian/changelog2010-04-26 15:17:47 + +++ debian/changelog2010-12-08 21:44:32 + @@ -1,3 +1,15 @@ +initramfs-tools (0.92bubuntu79) natty; urgency=low + + * When using multipath, it is possible that mountroot() will race +with udev's renaming of /dev/disk/by-uuid/{rootfs-uuid} from +/dev/sd?? to /dev/mapper/something. After multipath has grabbed +the /dev/sd?? and until udev completes the rename, mounting +/dev/disk/by-uuid/{rootfs-uuid} will fail with -EBUSY. In that +case, call 'udevsettle' to wait until udev has finished all its +related actions. (Closes LP: #686832) + + -- Serge Hallyn serge.hal...@ubuntu.com Fri, 19 Nov 2010 12:19:43 -0600 + initramfs-tools (0.92bubuntu78) lucid; urgency=low * hooks/compcache: Escape $-expansions inside EOF (thanks, Eugene San; === modified file 'scripts/local' --- scripts/local 2009-12-21 23:06:53 + +++ scripts/local 2010-11-20 01:03:26 + @@ -69,10 +69,19 @@ # FIXME This has no error checking [ -n ${FSTYPE} ] modprobe ${FSTYPE} - # FIXME This has no error checking # Mount root - mount ${roflag} ${FSTYPE:+-t ${FSTYPE} }${ROOTFLAGS} ${ROOT} ${rootmnt} - mountroot_status=$? + tries=0 + ret=1 + while [ $tries -lt 2 -a $ret -ne 0 ]; do + mount ${roflag} ${FSTYPE:+-t ${FSTYPE} }${ROOTFLAGS} ${ROOT} ${rootmnt} + ret=$? + if [ $ret -ne 0 ]; then + echo failed attempt $tries to mount $ROOT as root + udevadm settle + tries=$((tries+1)) + fi + done + mountroot_status=$ret if [ $LOOP ]; then if [ $mountroot_status != 0 ]; then if [ ${FSTYPE} = ntfs ] || [ ${FSTYPE} = vfat ]; then -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20101209212028.ga24...@hallyn.com
Bug#603944: retry mounting of root
Package: initramfs-tools Version: 0.98 When using multipath, it is possible that mountroot() will race with udev's renaming of /dev/disk/by-uuid/{rootfs-uuid} from /dev/sd?? to /dev/mapper/something. After multipath has grabbed the /dev/sd?? and until udev completes the rename, mounting /dev/disk/by-uuid/{rootfs-uuid} will fail with -EBUSY. Here is a patch I've been using successfully for awhile. Colin Watson has suggested that: A poll/retry loop is generally a suboptimal way to do this kind of thing; what we really want is to wait for udev to tell us that it has finished with the event that triggered renaming of the device. which does seem cleaner. thanks, -serge diff -Nru initramfs-tools-0.98ubuntu2/debian/changelog initramfs-tools-0.98ubuntu3/debian/changelog --- initramfs-tools-0.98ubuntu2/debian/changelog2010-08-20 03:48:58.0 -0500 +++ initramfs-tools-0.98ubuntu3/debian/changelog2010-08-24 22:31:31.0 -0500 @@ -1,3 +1,14 @@ +initramfs-tools (0.98ubuntu3) maverick; urgency=low + + * Add retries to mountroot(). This is particularly needed when we +use multipath, because it is possible that mountroot() will race +with udev's renaming of /dev/disk/by-uuid/{rootfs-uuid} from +/dev/sd?? to /dev/mapper/something. After multipath has grabbed +the /dev/sd?? and until udev completes the rename, mounting +/dev/disk/by-uuid/{rootfs-uuid} will fail with -EBUSY. + + -- Serge Hallyn serge.hal...@canonical.com Tue, 24 Aug 2010 22:17:57 -0500 + initramfs-tools (0.98ubuntu2) maverick; urgency=low * The ramzswap device changed its interface so that the disk size needs to diff -Nru initramfs-tools-0.98ubuntu2/scripts/local initramfs-tools-0.98ubuntu3/scripts/local --- initramfs-tools-0.98ubuntu2/scripts/local 2010-08-20 03:48:58.0 -0500 +++ initramfs-tools-0.98ubuntu3/scripts/local 2010-08-24 22:16:17.0 -0500 @@ -86,10 +86,19 @@ # FIXME This has no error checking [ -n ${FSTYPE} ] modprobe ${FSTYPE} - # FIXME This has no error checking # Mount root - mount ${roflag} ${FSTYPE:+-t ${FSTYPE} }${ROOTFLAGS} ${ROOT} ${rootmnt} - mountroot_status=$? + tries=0 + ret=1 + while [ $tries -lt 10 -a $ret -ne 0 ]; do + mount ${roflag} ${FSTYPE:+-t ${FSTYPE} }${ROOTFLAGS} ${ROOT} ${rootmnt} + ret=$? + if [ $ret -ne 0 ]; then + echo failed attempt $tries to mount $ROOT as root + sleep 1 + tries=$((tries+1)) + fi + done + mountroot_status=$ret if [ $LOOP ]; then if [ $mountroot_status != 0 ]; then if [ ${FSTYPE} = ntfs ] || [ ${FSTYPE} = vfat ]; then -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20101118174249.ga24...@hallyn.com