Re: [AppArmor 00/44] AppArmor security module overview
Adrian Bunk [EMAIL PROTECTED] writes: On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote: On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen [EMAIL PROTECTED] wrote: so... where do we stand with this? Fundamental, irreconcilable differences over the use of pathname-based security? There certainly seems to be some differences of opinion over the use of pathname-based-security. I was refreshed to have not been cc'ed on a lkml thread for once. I guess it couldn't last. Do you agree with the irreconcilable part? I think I do. I suspect that we're at the stage of having to decide between a) set aside the technical issues and grudgingly merge this stuff as a service to Suse and to their users (both of which entities are very important to us) and leave it all as an object lesson in how-not-to-develop-kernel-features. Minimisation of the impact on the rest of the kernel is of course very important here. versus b) leave it out and require that Suse wear the permanent cost and quality impact of maintaining it out-of-tree. It will still be an object lesson in how-not-to-develop-kernel-features. ... versus c) if [1] AppArmor is considered to be something that wouldn't be merged if it wasn't already widely deployed by Suse: leave it out, work on an ideal solution [2], and let Suse wear the one-time cost of migrating their users to the ideal solution One important point is that if AppArmor gets merged there will be much more distribution support for it, and many people on !Suse will start using it. I'm not claiming to understand the technical details, but from both slightly reading over the previous discussions and the What are the advantages of AppArmor over SELinux? section in the AppArmor FAQ [3] my impression is that a main advantage of AppArmor are more user friendly userspace tools. Therefore, if [1] AppArmor is considered technically inferior to SELinux, it might still become more popular than SELinux simply because it's easier to use - and although it's technically inferior. A couple of random thoughts to mix up this discussion. From what I have been able to observer the LSM is roughly firewalls rules for in box operations. All it can do is increase the chances you will get -EPERM. Not being able to take path names into consideration is a current deficiency of the LSM. Being exported to modules is another deficiency of the current LSM as it seems no serious user of the LSM can exist simply as a kernel module. Not being able to mix code from different projects is also a very serious limitation of the LSM. Currently I don't think we can build a kernel that supports selinux and any other LSM at the same time. Which horribly limits what we can do with the LSM. So it seems clear that if we are aiming at an ideal solution. We first need to enhance the LSM. Then merge in the AppArmor functionality. Doing it all in one patch series looks to overwhelming for a decent code review. That said is anyone interested in making the LSM more like iptables with a generic table based rules structure? That way we could fix the one true LSM problem and concentrate on simpler pieces that give specific bits of interesting functionality. Or at the very least be able to compile in multiple different bits of functionality into the kernel simultaneously. I'm really not familiar with the security issues the LSM addresses but I do know, it encourages huge incompatible mega solutions, and it tends to break when I fix real security problems in the kernel. So at this point I am convinced that the LSM is deficient, has very limited usability, and seems to be a very fragile firewall structure to me. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Miklos Szeredi [EMAIL PROTECTED] writes: On Apr 25 2007 11:21, Eric W. Biederman wrote: Why did we want to use fsuid, exactly? - Because ruid is completely the wrong thing we want mounts owned by whomever's permissions we are using to perform the mount. Think nfs. I access some nfs file as an unprivileged user. knfsd, by nature, would run as euid=0, uid=0, but it needs fsuid=jengelh for most permission logic to work as expected. I don't think knfsd will ever want to call mount(2). But yeah, I've been convinced, that using fsuid is the right thing to do. Actually knfsd does call mount when it crosses a mount point on the nfs server it generates an equivalent mount point in linux. At least I think that is the what it is doing. It is very similar to our mount propagation path. However as a special case I don't think the permission checking is likely to bite us there. It is worth double checking once we have the other details ironed out. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] - refine adding nosuid and nodev flags for unprivileged mounts: o add nosuid, only if mounter doesn't have CAP_SETUID capability o add nodev, only if mounter doesn't have CAP_MKNOD capability - allow unprivileged forced unmount, but only for FS_SAFE filesystems - allow mounting over special files, but not symlinks - for mounting and umounting check fsuid instead of ruid Andrew, please skip this patch, for now. Serge found a problem with the fsuid approach: setfsuid(nonzero) will remove filesystem related capabilities. So even if root is trying to set the user=UID flag on a mount, access to the target (and in case of bind, the source) is checked with user privileges. I do have a major problem with this patchset though. We still have the unnecessary concept of user mounts. That seems only needed now for the /proc/mounts output. All mounts should have an owner. Prior to the unprivileged mount work root owns all mounts. Root should be able to set this flag on any mountpoint, _regardless_ of permissions. We don't need a flag, and thinking of it in the context of a flag is clearly the wrong thing. Yes if we have the proper capability we should be able to explicitly specify the owner of the mount It is possible to restore filesystem capabilities after setting fsuid, but the interfaces are rather horrible at all levels. mount(8) can probably live with these, but I'm not sure that using fsuid over ruid has enough advantages to force this. Why did we want to use fsuid, exactly? - Because ruid is completely the wrong thing we want mounts owned by whomever's permissions we are using to perform the mount. There are two basic cases. - Mounting a filesystem as who we are. This can use fsuid with no problems. If we are suid to root to perform the mount by default we want root to own the mount so that is correct. - Mounting a filesystem as another user. This is the tricky case rare case needed in setup. If we aren't jumping through to many hoops to make it work when using fsuid it sounds like the right thing here as well. How hard is it to set fsuid to a different value? I.e. What hoops does root have to jump through. Further when using fsuid we don't need an extra flag to mount. Plus things are a little more consistent with the rest of the linux/unix interface. Now I can see doing something like using a special flag and not using fsuid for the one case where we explicitly want to mount a filesystem as someone else. However if only user space has to special case this (as it does anyway) and we don't have to special case it in the kernel. So much the better. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting H. Peter Anvin ([EMAIL PROTECTED]): Miklos Szeredi wrote: Andrew, please skip this patch, for now. Serge found a problem with the fsuid approach: setfsuid(nonzero) will remove filesystem related capabilities. So even if root is trying to set the user=UID flag on a mount, access to the target (and in case of bind, the source) is checked with user privileges. Root should be able to set this flag on any mountpoint, _regardless_ of permissions. Right, if you're using fsuid != 0, you're not running as root Sure, but what I'm not clear on is why, if I've done a prctl(PR_SET_KEEPCAPS, 1) before the setfsuid, I still lose the CAP_FS_MASK perms. I see the special case handling in cap_task_post_setuid(). I'm sure there was a reason for it, but this is a piece of the capability implementation I don't understand right now. So we drop CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, and CAP_FSETID Since we are checking CAP_SETUID or CAP_SYS_ADMIN how is that a problem? Are there other permission checks that mount is doing that we care about. (fsuid is the equivalent to euid for the filesystem.) If it were really the equivalent then I could keep my capabilities :) after changing it. We drop all capabilities after we change the euid. I fail to see how ruid should have *any* impact on mount(2). That seems to be a design flaw. May be, but just using fsuid at this point stops me from enabling user mounts under /share if /share is chmod 000 (which it is). I'm dense today. If we can't work out the details we can always use a flag. But what is the problem with fsuid? You are not trying to test this using a non-default security model are you? Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Are there other permission checks that mount is doing that we care about. Not mount itself, but in looking up /share/fa/root/home/fa, user fa doesn't have the rights to read /share, and by setting fsuid to fa and dropping CAP_DAC_READ_SEARCH the mount action fails. Got it. I'm not certain this is actually a problem it may be a feature. But it does fly in the face of the general principle of just getting out of roots way so things can get done. I think we can solve your basic problem by simply doing like: chdir(/share); mount(.); To simply avoid the permission problem. The practical question is how much do we care. But the solution you outlined in your previous post would work around this perfectly. If we are not using usual permissions which user do we use current-uid? Or do we pass that user someplace? If it were really the equivalent then I could keep my capabilities :) after changing it. We drop all capabilities after we change the euid. Not if we've done prctl(PR_SET_KEEPCAPS, 1) Ah cap_clear doesn't do the obvious thing. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] mount ownership and unprivileged mount syscall (v4)
Karel Zak [EMAIL PROTECTED] writes: On Fri, Apr 20, 2007 at 12:25:32PM +0200, Miklos Szeredi wrote: The following extra security measures are taken for unprivileged mounts: - usermounts are limited by a sysctl tunable - force nosuid,nodev mount options on the created mount The original userspace user= solution also implies the noexec option by default (you can override the default by exec option). It means the kernel based solution is not fully compatible ;-( Why noexec? Either it was a silly or arbitrary decision, or our kernel design may be incomplete. Now I can see not wanting to support executables if you are locking down a system. The classic don't execute a program from a CD just because the CD was stuck in the drive problem. So I can see how executing code from an untrusted source could prevent exploitation of other problems, and we certainly don't want to do it automatically. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/8] allow unprivileged umount
Miklos Szeredi [EMAIL PROTECTED] writes: Does this mean, that containers will need this? Or that you don't know yet? The uid namespace is something we have to handle carefully and we have not decided on the final design. What is clear is that all permission checks will need to become either (uid namspace, uid) tuple comparisons. Or struct user pointer comparisons. To see if we are talking about the same uid. So the eventual uid namespace combined with the possibility for rlimits if we use struct user *. See to make using a struct user a clear win. storing a user struct on each mount point seems sane, plus it allows per user mount rlimits which is major plus. Especially since we seem to be doing accounting only for user mounts a per user rlimit seems good. I'm not against per-user rlimits for mounts, but I'd rather do this later... Then let's add a non-discriminate limit. Instead of a limit that applies only to root. To get the user we should be user fs_uid as HPA suggested. fsuid is exclusively used for checking file permissions, which we don't do here anymore. So while it could be argued, that mount() _is_ a filesystem operation, it is really a different sort of filesystem operation than the rest. OTOH it wouldn't hurt to use fsuid instead of ruid... Yes. I may be confused but I'm pretty certain we want either the fsuid or the euid to be the mount owner. ruid just looks wrong. The fsuid is a special case of the effective uid. Which is who we should perform operations as. Unless I'm just confused. Finally I'm pretty certain the capability we should care about in this context is CAP_SETUID. Instead of CAP_SYS_ADMIN. If we have CAP_SETUID we can become which ever user owns the mount, and the root user in a container needs this, so he can run login programs. So changing the appropriate super user checks from CAP_SYS_ADMIN to CAP_SETUID I think is the right thing todo. That's a flawed logic. If you want to mount as a specific user, and you have CAP_SETUID, then just use set*uid() and then mount(). Totally agreed for mount. Changing the capability check for mount() would break the userspace ABI. Sorry I apparently wasn't clear. CAP_SETUID should be the capability check for umount. Hopefully my other more detail replies helped with this. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/8] add user mounts to the kernel
Miklos Szeredi [EMAIL PROTECTED] writes: The MNT_USER flag is not copied on any kind of mount cloning: namespace creation, binding or propagation. I half agree, and as an initial approximation this works. Ultimately we should be at the point that for mount propagation that we copy the owner of the from the owner of our parent mount at the propagation destination. Yes, that sounds the most sane. Ram, what do you think? + if (mnt-mnt_flags MNT_USER) + seq_printf(m, ,user=%i, mnt-mnt_uid); How about making the test if (mnt-mnt_user != root_user) We don't want to treat root_user special. That's what capabilities were invented for. For the print statement? What ever it is minor. Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-04-20 11:55:02.0 +0200 +++ linux/include/linux/fs.h 2007-04-20 11:55:05.0 +0200 @@ -123,6 +123,7 @@ extern int dir_notify_enable; #define MS_SLAVE (119) /* change to slave */ #define MS_SHARED (120) /* change to shared */ #define MS_RELATIME (121) /* Update atime relative to mtime/ctime. */ +#define MS_SETUSER(122) /* set mnt_uid to current user */ If we unconditionally use the fsuid I think we can get away without this flag. That coudl work if we wouldn't have to worry about breaking the user interface. As it is, we cannot be sure, that existing callers of mount(2) don't have fsuid set to some random value. If we can get away without an extra flag it would really be preferable. In the container case we have an interesting and very common scenario struct user *our_user != root_user. our_user-uid == 0. I.e. The root in the what is the container but not the root of the entire system. So I want to minimize the changes needed to existing programs. Now if all we have to do is specify MS_SETUSER when root a user with CAP_SETUID is setting up a mount as a user other then himself then I don't much care. If we have to call MS_SETUSER as unprivileged users I will have to modify mount binaries to work differently inside and outside of containers. Further there is only one or two versions of mount in widespread use on linux, and unless you do something special fsuid == euid. So the chance of fsuid set to some random value is pretty low. So yes I think we can be 99.9% certain that existing callers of mount(2) don't have fsuid set to some random value just by inspecting the code of mount(1). #define MNT_SHRINKABLE0x100 +#define MNT_USER 0x200 If we assign a user to all mount points and root gets to own the initial set of mounts then we don't need the internal MNT_USER flag. I think we do want to treat owned mounts special, rather than treating user=0 mounts special. I don't think we should treat any mount special and all mounts should be owned. + + uid_t mnt_uid; /* owner of the mount */ Can we please make this a user struct. That requires a bit of reference counting but it has uid namespace benefits as well as making it easy to implement per user mount rlimits. OK, can you ellaborate, what the uid namespace benifits are? In the uid namespace the comparison is simpler as are the propagations rules. Basically if you use a struct user you will never need to care about a uid namespace. If you don't we will have to tear through this code another time. Plus like I was mentioning earlier. If we do have a struct user there implementing per user mount rlimits becomes trivial. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/8] account user mounts
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed Since each mount has a user can we just make this a per user rlimit? If we are going to implement a sysctl at this point I think it should be a global limit that doesn't care if who you are. Even root can have recursive mounts that attempt to get out of control. Recursive bind mounts are done carefully enough, so they don't get out of control. Recursive mount propagations can get out of control. But root can shoot itself in the foot any number of ways, and it's not for the kernel to police that. Yes. It is. This is mostly about removing special cases. We routinely have limits on resources that are global and apply to root along with every one else. Root can change them but they still apply to root. Things like the number of inodes in the system or the total number of files. Since it is perfectly possible to do a per user rlimit at this stage in the design. I contend that either: - We implement a per user rlimit of mounts. - We implement a global limit on mounts. No other case makes sense. The previous objections were at least in part because the limit only applied to user mounts but the name of the limit did not apply to user mounts. Also currently you are not checking the max_users. It looks like you do this in a later patch but still it is a little strange to allow user own mounts and have accounting but to not check the limit at this state. Yeah, but at this stage user mounts are not yet allowed, so this is safe. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/8] allow unprivileged umount
Andrew Morton [EMAIL PROTECTED] writes: On Sat, 21 Apr 2007 10:09:42 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: +static bool permit_umount(struct vfsmount *mnt, int flags) +{ ... +return mnt-mnt_uid == current-uid; +} Yes, this seems very wrong. I'd have thought that comparing user_struct*'s would get us a heck of a lot closer to being able to support aliasing of UIDs between different namespaces. OK, I'll fix this up. Actually an earlier version of this patch did use user_struct's but I'd changed it to uids, because it's simpler. OK.. I didn't think about this being contrary to the id namespaces thing. Well I was madly assuming that when serarate UID namespaces are in use, UID 42 in container A will have a different user_struct from UID 42 in container B. I'd suggest that we provoke an opinion from Eric co before you do work on this. That is what I what I have been thinking as well, storing a user struct on each mount point seems sane, plus it allows per user mount rlimits which is major plus. Especially since we seem to be doing accounting only for user mounts a per user rlimit seems good. To get the user we should be user fs_uid as HPA suggested. Finally I'm pretty certain the capability we should care about in this context is CAP_SETUID. Instead of CAP_SYS_ADMIN. If we have CAP_SETUID we can become which ever user owns the mount, and the root user in a container needs this, so he can run login programs. So changing the appropriate super user checks from CAP_SYS_ADMIN to CAP_SETUID I think is the right thing todo. With the CAP_SETUID thing handled I'm not currently seeing any adverse implications to using this in containers. Ok. Now that I have a reasonable approximation of the 10,000 foot view now to see how the patches match up. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/8] add user mounts to the kernel
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Add ownership information to mounts. A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current real user id and the mount will be marked with the MNT_USER flag. On remount don't preserve previous owner, and treat MS_SETUSER as for a new mount. The MS_SETUSER flag is ignored on mount move. The MNT_USER flag is not copied on any kind of mount cloning: namespace creation, binding or propagation. I half agree, and as an initial approximation this works. Ultimately we should be at the point that for mount propagation that we copy the owner of the from the owner of our parent mount at the propagation destination. Mount propagation semantics are a major pain. For bind mounts the cloned mount(s) are set to MNT_USER depending on the MS_SETUSER mount flag. In all the other cases MNT_USER is always cleared. For MNT_USER mounts a user=UID option is added to /proc/PID/mounts. This is compatible with how mount ownership is stored in /etc/mtab. Ok. While I generally agree with the concept this can be simplified some more. Eric Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-20 11:55:02.0 +0200 +++ linux/fs/namespace.c 2007-04-20 11:55:05.0 +0200 @@ -227,6 +227,13 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void set_mnt_user(struct vfsmount *mnt) +{ + BUG_ON(mnt-mnt_flags MNT_USER); + mnt-mnt_uid = current-uid; This should be based on fsuid. Unless I'm completely confused. I think getting the mount user from current when we do mount propagation could be a problem. In particular I'm pretty certain in the mount propagation case the mount should be owned by the user who owns the destination mount that is above us. + mnt-mnt_flags |= MNT_USER; +} + static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, int flag) { @@ -241,6 +248,11 @@ static struct vfsmount *clone_mnt(struct mnt-mnt_mountpoint = mnt-mnt_root; mnt-mnt_parent = mnt; + /* don't copy the MNT_USER flag */ + mnt-mnt_flags = ~MNT_USER; + if (flag CL_SETUSER) + set_mnt_user(mnt); + if (flag CL_SLAVE) { list_add(mnt-mnt_slave, old-mnt_slave_list); mnt-mnt_master = old; @@ -403,6 +415,8 @@ static int show_vfsmnt(struct seq_file * if (mnt-mnt_flags fs_infop-flag) seq_puts(m, fs_infop-str); } + if (mnt-mnt_flags MNT_USER) + seq_printf(m, ,user=%i, mnt-mnt_uid); How about making the test if (mnt-mnt_user != root_user) if (mnt-mnt_sb-s_op-show_options) err = mnt-mnt_sb-s_op-show_options(m, mnt); seq_puts(m, 0 0\n); @@ -920,8 +934,9 @@ static int do_change_type(struct nameida /* * do loopback mount. */ -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) +static int do_loopback(struct nameidata *nd, char *old_name, int flags) { + int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); @@ -941,11 +956,12 @@ static int do_loopback(struct nameidata if (!check_mnt(nd-mnt) || !check_mnt(old_nd.mnt)) goto out; + clone_flags = (flags MS_SETUSER) ? CL_SETUSER : 0; err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry, 0); + if (flags MS_REC) + mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags); else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry, 0); + mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags); if (!mnt) goto out; @@ -987,8 +1003,11 @@ static int do_remount(struct nameidata * down_write(sb-s_umount); err = do_remount_sb(sb, flags, data, 0); - if (!err) + if (!err) { nd-mnt-mnt_flags = mnt_flags; + if (flags MS_SETUSER) + set_mnt_user(nd-mnt); + } up_write(sb-s_umount); if (!err) security_sb_post_remount(nd-mnt, flags, data); @@ -1093,10 +1112,13 @@ static int do_new_mount(struct nameidata if (!capable(CAP_SYS_ADMIN)) return -EPERM; - mnt = do_kern_mount(type, flags, name, data); + mnt = do_kern_mount(type, flags ~MS_SETUSER, name, data); if (IS_ERR(mnt)) return PTR_ERR(mnt); + if (flags MS_SETUSER) + set_mnt_user(mnt); + return do_add_mount(mnt, nd, mnt_flags, NULL); } @@ -1127,7
Re: [patch 2/8] allow unprivileged umount
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having user=UID option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-20 11:55:05.0 +0200 +++ linux/fs/namespace.c 2007-04-20 11:55:06.0 +0200 @@ -659,6 +659,25 @@ static int do_umount(struct vfsmount *mn } /* + * umount is permitted for + * - sysadmin + * - mount owner, if not forced umount + */ +static bool permit_umount(struct vfsmount *mnt, int flags) +{ + if (capable(CAP_SYS_ADMIN)) + return true; + + if (!(mnt-mnt_flags MNT_USER)) + return false; + + if (flags MNT_FORCE) + return false; + + return mnt-mnt_uid == current-uid; +} I think this should be: static bool permit_umount(struct vfsmount *mnt, int flags) { if ((mnt-mnt_uid != current-fsuid) !capable(CAP_SETUID)) return false; if ((flags MNT_FORCE) !capable(CAP_SYS_ADMIN)) return false; return true; } I.e. MNT_USER gone. compare against fsuid. Only require setuid for unmounts unless force is specified. I suspect we can allow MNT_FORCE for non-privileged users as well if we can trust the filesystem. +/* * Now umount can handle mount points as well as block devices. * This is important for filesystems which use unnamed block devices. * @@ -681,7 +700,7 @@ asmlinkage long sys_umount(char __user * goto dput_and_out; retval = -EPERM; - if (!capable(CAP_SYS_ADMIN)) + if (!permit_umount(nd.mnt, flags)) goto dput_and_out; retval = do_umount(nd.mnt, flags); -- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/8] account user mounts
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed Since each mount has a user can we just make this a per user rlimit? If we are going to implement a sysctl at this point I think it should be a global limit that doesn't care if who you are. Even root can have recursive mounts that attempt to get out of control. Also currently you are not checking the max_users. It looks like you do this in a later patch but still it is a little strange to allow user own mounts and have accounting but to not check the limit at this state. Eric Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/include/linux/sysctl.h === --- linux.orig/include/linux/sysctl.h 2007-04-20 11:55:02.0 +0200 +++ linux/include/linux/sysctl.h 2007-04-20 11:55:07.0 +0200 @@ -818,6 +818,8 @@ enum FS_AIO_NR=18, /* current system-wide number of aio requests */ FS_AIO_MAX_NR=19, /* system-wide maximum number of aio requests */ FS_INOTIFY=20, /* inotify submenu */ + FS_NR_USER_MOUNTS=21, /* int:current number of user mounts */ + FS_MAX_USER_MOUNTS=22, /* int:maximum number of user mounts */ FS_OCFS2=988, /* ocfs2 */ }; Index: linux/kernel/sysctl.c === --- linux.orig/kernel/sysctl.c2007-04-20 11:55:02.0 +0200 +++ linux/kernel/sysctl.c 2007-04-20 11:55:07.0 +0200 @@ -1063,6 +1063,22 @@ static ctl_table fs_table[] = { #endif #endif { + .ctl_name = FS_NR_USER_MOUNTS, + .procname = nr_user_mounts, + .data = nr_user_mounts, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + { + .ctl_name = FS_MAX_USER_MOUNTS, + .procname = max_user_mounts, + .data = max_user_mounts, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { .ctl_name = KERN_SETUID_DUMPABLE, .procname = suid_dumpable, .data = suid_dumpable, Index: linux/Documentation/filesystems/proc.txt === --- linux.orig/Documentation/filesystems/proc.txt 2007-04-20 11:55:02.0 +0200 +++ linux/Documentation/filesystems/proc.txt 2007-04-20 11:55:07.0 +0200 @@ -923,6 +923,15 @@ reaches aio-max-nr then io_setup will fa raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures. +nr_user_mounts and max_user_mounts +-- + +These represent the number of user mounts and the maximum number of +user mounts respectively. User mounts may be created by +unprivileged users. User mounts may also be created with sysadmin +privileges on behalf of a user, in which case nr_user_mounts may +exceed max_user_mounts. + 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-20 11:55:06.0 +0200 +++ linux/fs/namespace.c 2007-04-20 11:55:07.0 +0200 @@ -39,6 +39,9 @@ static int hash_mask __read_mostly, hash static struct kmem_cache *mnt_cache __read_mostly; static struct rw_semaphore namespace_sem; +int nr_user_mounts; +int max_user_mounts = 1024; + /* /sys/fs */ decl_subsys(fs, NULL, NULL); EXPORT_SYMBOL_GPL(fs_subsys); @@ -227,11 +230,30 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void dec_nr_user_mounts(void) +{ + spin_lock(vfsmount_lock); + nr_user_mounts--; + spin_unlock(vfsmount_lock); +} + static void set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt-mnt_flags MNT_USER); mnt-mnt_uid = current-uid; mnt-mnt_flags |= MNT_USER; + spin_lock(vfsmount_lock); + nr_user_mounts++; + spin_unlock(vfsmount_lock); +} + +static void clear_mnt_user(struct vfsmount *mnt) +{ + if (mnt-mnt_flags MNT_USER) { + mnt-mnt_uid = 0; + mnt-mnt_flags = ~MNT_USER; + dec_nr_user_mounts(); + } } static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, @@ -283,6 +305,7 @@ static inline void __mntput(struct vfsmo { struct super_block *sb =
Re: [patch 4/8] propagate error values from clone_mnt
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Allow clone_mnt() to return errors other than ENOMEM. This will be used for returning a different error value when the number of user mounts goes over the limit. Fix copy_tree() to return EPERM for unbindable mounts. Don't propagate further from dup_mnt_ns() as that copy_tree() can only fail with -ENOMEM. At a quick skim this looks ok. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/8] allow unprivileged bind mounts
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Allow bind mounts to unprivileged users if the following conditions are met: - mountpoint is not a symlink or special file Why? This sounds like a left over from when we were checking permissions. - parent mount is owned by the user - the number of user mounts is below the maximum Unprivileged mounts imply MS_SETUSER, and will also have the nosuid and nodev mount flags set. So in principle I agree, but in detail I disagree. capable(CAP_SETUID) should be required to leave MNT_NOSUID clear. capable(CAP_MKNOD) should be required to leave MNT_NODEV clear. I.e. We should not special case this as a user mount but rather simply check to see if the user performing the mount has the appropriate capabilities to allow the flags. How we properly propagate MNT_NOSUID and MNT_NODEV in the context of a user id namespace is still a puzzle to me. Because to the user capability should theoretically at least be namespace local. However until we get to the user id namespace we don't have that problem. Eric Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-20 11:55:09.0 +0200 +++ linux/fs/namespace.c 2007-04-20 11:55:10.0 +0200 @@ -237,11 +237,30 @@ static void dec_nr_user_mounts(void) spin_unlock(vfsmount_lock); } -static void set_mnt_user(struct vfsmount *mnt) +static int reserve_user_mount(void) +{ + int err = 0; + spin_lock(vfsmount_lock); + if (nr_user_mounts = max_user_mounts !capable(CAP_SYS_ADMIN)) + err = -EPERM; + else + nr_user_mounts++; + spin_unlock(vfsmount_lock); + return err; +} + +static void __set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt-mnt_flags MNT_USER); mnt-mnt_uid = current-uid; mnt-mnt_flags |= MNT_USER; + if (!capable(CAP_SYS_ADMIN)) + mnt-mnt_flags |= MNT_NOSUID | MNT_NODEV; +} + +static void set_mnt_user(struct vfsmount *mnt) +{ + __set_mnt_user(mnt); spin_lock(vfsmount_lock); nr_user_mounts++; spin_unlock(vfsmount_lock); @@ -260,9 +279,16 @@ static struct vfsmount *clone_mnt(struct int flag) { struct super_block *sb = old-mnt_sb; - struct vfsmount *mnt = alloc_vfsmnt(old-mnt_devname); + struct vfsmount *mnt; + + if (flag CL_SETUSER) { + int err = reserve_user_mount(); + if (err) + return ERR_PTR(err); + } + mnt = alloc_vfsmnt(old-mnt_devname); if (!mnt) - return ERR_PTR(-ENOMEM); + goto alloc_failed; mnt-mnt_flags = old-mnt_flags; atomic_inc(sb-s_active); @@ -274,7 +300,7 @@ static struct vfsmount *clone_mnt(struct /* don't copy the MNT_USER flag */ mnt-mnt_flags = ~MNT_USER; if (flag CL_SETUSER) - set_mnt_user(mnt); + __set_mnt_user(mnt); if (flag CL_SLAVE) { list_add(mnt-mnt_slave, old-mnt_slave_list); @@ -299,6 +325,11 @@ static struct vfsmount *clone_mnt(struct spin_unlock(vfsmount_lock); } return mnt; + + alloc_failed: + if (flag CL_SETUSER) + dec_nr_user_mounts(); + return ERR_PTR(-ENOMEM); } static inline void __mntput(struct vfsmount *mnt) @@ -745,22 +776,29 @@ asmlinkage long sys_oldumount(char __use #endif -static int mount_is_safe(struct nameidata *nd) +/* + * Conditions for unprivileged mounts are: + * - mountpoint is not a symlink or special file + * - mountpoint is in a mount owned by the user + */ +static bool permit_mount(struct nameidata *nd, int *flags) { + struct inode *inode = nd-dentry-d_inode; + if (capable(CAP_SYS_ADMIN)) - return 0; - return -EPERM; -#ifdef notyet - if (S_ISLNK(nd-dentry-d_inode-i_mode)) - return -EPERM; - if (nd-dentry-d_inode-i_mode S_ISVTX) { - if (current-uid != nd-dentry-d_inode-i_uid) - return -EPERM; - } - if (vfs_permission(nd, MAY_WRITE)) - return -EPERM; - return 0; -#endif + return true; + + if (!S_ISDIR(inode-i_mode) !S_ISREG(inode-i_mode)) + return false; + + if (!(nd-mnt-mnt_flags MNT_USER)) + return false; + + if (nd-mnt-mnt_uid != current-uid) + return false; + + *flags |= MS_SETUSER; + return true; } Can't this just be: static bool permit_mount(struct nameidata *nd, uid_t *mnt_uid) { *mnt_uid = current-fsuid; if ((nd-mnt-mnt_uid != current-fsuid) !capable(CAP_SETUID)) return false; return true; } static int lives_below_in_same_fs(struct dentry *d,
Re: [patch 7/8] allow unprivileged mounts
Andrew Morton [EMAIL PROTECTED] writes: On Fri, 20 Apr 2007 12:25:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag. Practically speaking, is there any realistic likelihood that any filesystem apart from FUSE will ever use this? Also potentially some of the kernel virtual filesystems. /proc should be safe already. If you don't have any kind of backing store this problem gets easier. With unprivileged users allowed to create mounts the utility of kernel functionality exported as filesystems goes up quite a bit. We are not plan9 but this is the last bottleneck in allowing the everything is a filesystem paradigm from being really usable in linux. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 8/8] allow unprivileged fuse mounts
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] Use FS_SAFE for fuse fs type, but not for fuseblk. FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. In addition unprivileged mounts require the parent mount to be owned by the user, which is more strict than the current userspace policy. This will enable future installations to remove the suid-root fusermount utility. Don't require the user_id= and group_id= options for unprivileged mounts, but if they are present, verify them for sanity. Disallow the allow_other option for unprivileged mounts. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/fuse/inode.c === --- linux.orig/fs/fuse/inode.c2007-04-20 11:55:01.0 +0200 +++ linux/fs/fuse/inode.c 2007-04-20 11:55:14.0 +0200 @@ -311,6 +311,19 @@ static int parse_fuse_opt(char *opt, str d-max_read = ~0; d-blksize = 512; + /* + * For unprivileged mounts use current uid/gid. Still allow + * user_id and group_id options for compatibility, but + * only if they match these values. + */ + if (!capable(CAP_SYS_ADMIN)) { + d-user_id = current-uid; + d-user_id_present = 1; + d-group_id = current-gid; + d-group_id_present = 1; + + } CAP_SETUID is the appropriate capability... This is not a dimension we have not fully explored. What is the problem with a user controlled mount having different uid and gid values. Yes they map into different users but how is this a problem. The only problem that I can recall is the historic chown problem where you could give files to other users and mess up their quotas. Or is the problem other users writing to this user controlled filesystem? Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 7/8] allow unprivileged mounts
Jan Engelhardt [EMAIL PROTECTED] writes: On Apr 21 2007 08:10, Eric W. Biederman wrote: Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag. Practically speaking, is there any realistic likelihood that any filesystem apart from FUSE will ever use this? Also potentially some of the kernel virtual filesystems. /proc should be safe already. If you don't have any kind of backing store this problem gets easier. tmpfs! tmpfs is a possible problem because it can consume lots of ram/swap. Which is why it has limits on the amount of space it can consume. Those are set as mount options as I recall. Which means that we would need to do something different with respect to limits before tmpfs could become safe for an untrusted user to mount. Still it's close. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 7/8] allow unprivileged mounts
Andi Kleen [EMAIL PROTECTED] writes: Andrew Morton [EMAIL PROTECTED] writes: On Fri, 20 Apr 2007 12:25:39 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag. Practically speaking, is there any realistic likelihood that any filesystem apart from FUSE will ever use this? If it worked for mount --bind for any fs I could see uses of this. I haven't thought through the security implications though, so it might not work. Binding a directory that you have access to in other was is essentially the same thing as a symlink. So there are no real security implications there. The only problem case I can think of is removal media that you want to remove but someone has made a bind mount to. But that is essentially the same case as opening a file so there are no new real issues. Although our diagnostic tools will likely fall behind for a bit. We handle the security implications by assigning an owner to all mounts and only allowing you to add additional mounts on top of a mount you already own. If you have the right capabilities you can create a mount owned by another user. For a new mount if you don't have the appropriate capabilities nodev and nosuid will be forced. Initial super block creation is a lot more delicate so we need the FS_SAFE flag, to know that the kernel is prepared to deal with the crazy things that a hostile user space is prepared to do. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] mount ownership and unprivileged mount syscall (v4)
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Miklos Szeredi ([EMAIL PROTECTED]): This patchset has now been bared to the lowest common denominator that everybody can agree on. Or at least there weren't any objections to this proposal. Andrew, please consider it for -mm. Thanks, Miklos v3 - v4: - simplify interface as much as possible, now only a single option (user=UID) is used to control everything - no longer allow/deny mounting based on file/directory permissions, that approach does not always make sense This patchset adds support for keeping mount ownership information in the kernel, and allow unprivileged mount(2) and umount(2) in certain cases. The mount owner has the following privileges: - unmount the owned mount - create a submount under the owned mount The sysadmin can set the owner explicitly on mount and remount. When an unprivileged user creates a mount, then the owner is automatically set to the user. The following use cases are envisioned: 1) Private namespace, with selected mounts owned by user. E.g. /home/$USER is a good candidate for allowing unpriv mounts and unmounts within. 2) Private namespace, with all mounts owned by user and having the nosuid flag. User can mount and umount anywhere within the namespace, but suid programs will not work. 3) Global namespace, with a designated directory, which is a mount owned by the user. E.g. /mnt/users/$USER is set up so that it is bind mounted onto itself, and set to be owned by $USER. The user can add/remove mounts only under this directory. The following extra security measures are taken for unprivileged mounts: - usermounts are limited by a sysctl tunable - force nosuid,nodev mount options on the created mount Very nice. I like these semantics. I'll try to rework my laptop in the next few days to use this patchset as a test. Agreed. It appears the approach of adding owner ship information to mount points and using that to control what may happen with them in regards to mount/unmount is the only workable approach in the unix environment. Now to dig into the details and ensure that they are correct. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Miklos Szeredi [EMAIL PROTECTED] writes: I've tried to make this unprivileged mount thing as simple as possible, and no simpler. If we can make it even simpler, all the better. We are certainly much more complex then the code in plan9 (just read through it) so I think we have room for improvement. Just for reference what I saw in plan 9 was: - No super user checks in it's mount, unmount, or namespace creation paths. - A flag to deny new mounts but not new bind mounts (for administrative purposes the comment said). Our differences from plan9. - suid capable binaries. (SUID please go away). - A history of programs assuming only root could call mount/unmount. I hate suid as well. _The_ motivation behind this patchset was to get rid of fusermount, a suid mount helper for fuse. But I don't think suid is going away, and definitely not overnight. Agreed, unless it just happens that killing is equally as hard as doing non-privileged mounts. Which I really don't think will be the case. suid executables can always be replaced by a non-privileged part that connect to a daemon on localhost. Also I don't think we want to require auditing userspace before enabling user mounts. Given the complexity of the code to avoids audits adds, and especially the uncertainty of how all of the pieces add up together I really think we do need to audit user space (but only in a limited way). This is a paradigm shift in how mounts are managed and to get the full benefit of it we need to be prepared to deal with the paradigm shift. Essentially what the audit must ensure is that (a) non-root users are always run in a non-default namespace so mounts and unmounts they generate will not have global effect. (b) If we setup something like the proposed /share that administrative tools know how to treat it properly. That is not an especially hard audit of user space and it noticeably reduces the complexity of what we must implement. If I understand correctly, your proposal is to get rid of MNT_USER and MNT_ALLOWUSERMNT and allow/deny unprivileged mounts and umounts based on a boolean sysctl flag and on a check if the target namespace is the initial namespace or not. No. I really do not want to treat the initial namespace in a special way from an implementation point of view. From a distro point of view I don't think we should ever allow a user into the initial mount namespace, and that is what we should audit. All of the ways a user can login to a machine. We need to audit the login paths anyway if we are going to do anything interesting with the mount namespace. So I see this as no real additional overhead to make things usable. And maybe add some extra checks which prevent ugliness from happening with suid programs. Is this correct? Definitely add some extra permission checks to prevent ugliness from happening with suid executables. In the general case the problem with suid executables is that they expect parts of the mount namespace that a user cannot modify not to be modified. Ensuring that our permission checks keep that promise for mount and umount is going to be a bit challenging, because we need to think through a lot of scenarios. But it is not that hard. If so, how are we going to make sure this won't break existing userspace without doing a full audit of all suid programs in every distro that wants this feature? The permission checks to ensure you can not modify a filesystem with mounts in a way that you could not modify it by creating or deleting files should be enough. That probably means that we are restricted to mounting filesystems with the equivalent of uid=$(id -u) gid=$(id -g). That is if you aren't root all files in the filesystem you cause to be mounted must be owned by you. That way you have permission to unmount or delete them. At the very least the user who causes a mount to be created should be the owner and have complete control over the directory mount point. So that they can at least theoretically remove all of the files and directories at the mount point (which would be equivalent to unmounting it). Also how are we going to prevent the user from creating millions of mounts, and using up all the kernel memory for vfsmounts? I definitely agree that we need some kind of accounting. I'm don't think we can do this per user (which would be nice) and I don't believe your code does attempts to limit mounts by user either. A simple global limit on the number of mounts should be sufficient. If necessary we can allow the limit to be ignored if you have the appropriate capability. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Serge E. Hallyn [EMAIL PROTECTED] writes: Why are directory permissions not sufficient to allow/deny non-priveleged mounts? I don't understand that contention yet. The same scenarios laid out previously in this thread. I.e. 1. user hallyn does mount --bind / /home/hallyn/root 2. (...) 3. admin does deluser hallyn and deluser starts wiping out root Or, 1. user hallyn does mount --bind / /home/hallyn/root 2. backup daemon starts backing up /home/hallyn/root/home/hallyn/root/home... So we started down the path of forcing users to clone a new namespace before doing user mounts, which is what the clone flag was about. Using per-mount flags also suffices as you had pointed out, which is being done here. But directory permissions are inadequate. Interesting So far even today these things can happen, however they are sufficiently unlikely the tools don't account for them. Once a hostile user can cause them things are more of a problem. (Unless you want to tackle each problem legacy tool one at a time to remove problems - i.e. deluser should umount everything under /home/hallyn before deleting, backup should be spawned from it's own namespace cloned right after boot or just back up on one filesystem, etc.) I don't see a way that backup and deluser won't need to be modified to work properly in a system where non-priveleged mounts are allowed, at least they will need to account for /share. That said it is clearly a hazard if we enable this functionality by default. If we setup a pam module that triggers on login and perhaps when cron and at jobs run to setup an additional mount namespace I think keeping applications locked away in their own mount namespace is sufficient to avoid hostile users from doing unexpected things to the initial mount namespace. So unless I am mistake it should be relatively simple to prevent user space from encountering problems. That still leaves the question of how we handle systems with an old user space that is insufficiently robust to deal with mounts occurring at unexpected locations. I think a simple sysctl to enable/disable of non-priveleged mounts defaulting to disabled is enough. Am I correct or will it be more difficult than just a little pam module to ensure non-trusted users never run in the initial mount namespace? Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Miklos Szeredi [EMAIL PROTECTED] writes: I'm still not sure, what your problem is. My problem right now is that I see a serious complexity escalation in the user interface that we must support indefinitely. I see us taking a nice powerful concept and seriously watering it down. To some extent we have to avoid confusing suid applications. (I would so love to remove the SUID bit...). I'm being contrary to ensure we have a good code review. I have heard it said that there are two kinds of design. Something so simple it obviously has no deficiencies. Something so complex it has no obvious deficiencies. I am very much afraid we are slipping the mount namespace into the latter category of work. Which is a bad bad thing for core OS feature. With the v3 of the usermounts patchset, by default, user mounts are disabled, because the allow unpriv submounts flag is cleared on all mounts. There are several options available to sysadmins and distro builders to enable user mounts in a secure way: - pam module, which creates a private namespace, and sets allow unpriv submounts on the mounts within the namespace - pam module, which rbinds / onto /mnt/ns/$USER, and chroots into /mnt/ns/$USER, then sets the allow unpriv submounts on the mounts under /mnt/ns/$USER. In part this really disturbs me because we now have two mechanisms for controlling the scope of what a user can do. A flag or a new namespace. Two mechanisms to accomplish the same thing sound wrong, and hard to manage. - sysadmin creates /mnt/usermounts writable to all users, with sticky bit (same as /tmp), does mount --bind /mnt/usermounts /mnt/usermounts and sets the allow unpriv submounts on /mnt/usermounts. All of these are perfectly safe wrt userdel and backup (assuming it doesn't try back up /mnt). I also don't understand at all the user= mount flag and options. All it seemed to be used for was adding permissions to unmount. In user space to deal with the lack of any form of untrusted mounts I can understand this. In kernel space this seems to be more of a problem. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Miklos Szeredi [EMAIL PROTECTED] writes: I'm still not sure, what your problem is. My problem right now is that I see a serious complexity escalation in the user interface that we must support indefinitely. I see us taking a nice powerful concept and seriously watering it down. To some extent we have to avoid confusing suid applications. (I would so love to remove the SUID bit...). I'm being contrary to ensure we have a good code review. OK. And it's very much appreciated :) I have heard it said that there are two kinds of design. Something so simple it obviously has no deficiencies. Something so complex it has no obvious deficiencies. I am very much afraid we are slipping the mount namespace into the latter category of work. Which is a bad bad thing for core OS feature. I've tried to make this unprivileged mount thing as simple as possible, and no simpler. If we can make it even simpler, all the better. We are certainly much more complex then the code in plan9 (just read through it) so I think we have room for improvement. Just for reference what I saw in plan 9 was: - No super user checks in it's mount, unmount, or namespace creation paths. - A flag to deny new mounts but not new bind mounts (for administrative purposes the comment said). Our differences from plan9. - suid capable binaries. (SUID please go away). - A history of programs assuming only root could call mount/unmount. In part this really disturbs me because we now have two mechanisms for controlling the scope of what a user can do. You mean rbind+chroot and clone(CLONE_NS)? Yes, those are two different mechanisms achieving very similar results. But what has this to do with unprivileged mounts? The practical question is how do we limit what a user can mount and unmount. I would contend that at first glance stuffing a user in their own mount namespace is sufficient, on a system with utilities aware of the consequences of mount/unmount. So we may not need a unprivileged mount disable except as a way to allow an old user space to run a new kernel. A flag or a new namespace. Two mechanisms to accomplish the same thing sound wrong, and hard to manage. The flag permitting the unprivileged mounts (which we now agreed to name allowusermnt) is used in both cases. Just creating a new namespace doesn't always imply that you want to allow user mounts inside, does it? These are orthogonal features. After user space has been updated we always want to allow unprivileged mounts. If I get pushed I will say that we need to remove suid exec capability from user space as well. At which point we don't even need directory security checks, there is enough benefit there I certainly think it is worth considering having an entire NOSUID user space. Removing suid is probably excessive but if it isn't much harder then sane mount namespace support we should probably consider it. - sysadmin creates /mnt/usermounts writable to all users, with sticky bit (same as /tmp), does mount --bind /mnt/usermounts /mnt/usermounts and sets the allow unpriv submounts on /mnt/usermounts. All of these are perfectly safe wrt userdel and backup (assuming it doesn't try back up /mnt). I also don't understand at all the user= mount flag and options. The user=UID or (or MNT_USER flag) serves multiple purposes: - help mount(8) move away from /etc/mtab - allow unprivileged umounts - account user mounts All it seemed to be used for was adding permissions to unmount. In user space to deal with the lack of any form of untrusted mounts I can understand this. In kernel space this seems to be more of a problem. Why is handling unprivileged mounts in kernel different from handling them in userspace in this respect? Ok. I just looked at what user space is doing. The difference is that what user space is doing predates mount namespaces, and was there as far as I can tell to keep one user from causing problems for another user. If we choose to make mount namespaces to be the unit of granularity we don't need this capability. All we have to do is deny unmounts that would confuse a suid executable. Which mounts are those? Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
Miklos Szeredi [EMAIL PROTECTED] writes: Arn't there ways to escape chroot jails? Serge had pointed me to a URL which showed chroots can be escaped. And if that is true than having all user's private mount tree in the same namespace can be a security issue? No. In fact chrooting the user into /share/$USER will actually _grant_ a privilege to the user, instead of taking it away. It allows the user to modify it's root namespace, which it wouldn't be able to in the initial namespace. So even if the user could escape from the chroot (which I doubt), s/he would not be able to do any harm, since unprivileged mounting would be restricted to /share. Also /share/$USER should only have read/search permission for $USER or no permissions at all, which would mean, that other users' namespaces would be safe from tampering as well. A couple of points. - chroot can be escaped, it is just a chdir for the root directory it is not a security feature. The only security is that you have to be root to call chdir. A carefully done namespace setup won't have that issue. - While it may not violate security as far as what a user is allowed to modify it may violate security as far as what a user is allowed to see. There are interesting per login cases as well such as allowing a user to replicate their mount tree from another machine when they log in. When /home is on a network filesystem this can be very practical and can allow propagation of mounts across machines not just across a single login session. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Miklos Szeredi [EMAIL PROTECTED] writes: That depends. Current patches check the unprivileged submounts allowed under this mount flag only on the requested mount and not on the propagated mounts. Do you see a problem with this? I think privileges of this sort should propagate. If I read what you just said correctly if I have a private mount namespace I won't be able to mount anything unless when it was setup the unprivileged submount command was explicitly set. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [patch 05/10] add permit user mounts in new namespace clone flag
Miklos Szeredi [EMAIL PROTECTED] writes: That depends. Current patches check the unprivileged submounts allowed under this mount flag only on the requested mount and not on the propagated mounts. Do you see a problem with this? I think privileges of this sort should propagate. If I read what you just said correctly if I have a private mount namespace I won't be able to mount anything unless when it was setup the unprivileged submount command was explicitly set. By design yes. Why is that a problem? It certainly doesn't match my intuition. Why are directory permissions not sufficient to allow/deny non-priveleged mounts? I don't understand that contention yet. I should probably go back and look and see how plan9 handles mount/unmount permissions. Plan9 gets away with a lot more because it doesn't have a suid bit and mount namespaces were always present, so they don't have backwards compatibility problems. My best guess at the moment is that plan9 treated mount/unmount as completely unprivileged and used the mount namespaces to limit the scope of what would be affected by a mount/unmount operation. I think that may be reasonable in linux as well but it will require the presence of a mount namespace to limit the affects of what a user can do. So short of a more thorough audit I believe the final semantics should be: - mount/unmount for non-priveleged processes should only be limited by the mount namespace and directory permissions. - CLONE_NEWNS should not be a privileged operation. What prevents us from allowing these things? - Unprivileged CLONE_NEWNS and unprivileged mounts needs resource accounting so we don't have a denial of service attack. - Unprivileged mounts must be limited to directories that we have permission to modify in a way that we could get the same effect as the mount or unmount operation in terms of what files are visible otherwise we can mess up SUID executables. - Anything else? There are user space issues such as a reasonable pam module and how to do backups. However those are user space issues. What am I missing that requires us to add MNT_USER and MNT_USERMNT? Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 02/10] allow unprivileged umount
Miklos Szeredi [EMAIL PROTECTED] writes: From: Miklos Szeredi [EMAIL PROTECTED] The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having user=UID option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint. bool in the kernel? int would be much more recognizable as this is not C++ Or do you have place to convert the rest of the kernel that is using int to return a true/false value to bool? +static bool permit_umount(struct vfsmount *mnt, int flags) +{ + if (capable(CAP_SYS_ADMIN)) + return true; + + if (!(mnt-mnt_flags MNT_USER)) + return false; + + if (flags MNT_FORCE) + return false; + + return mnt-mnt_uid == current-uid; +} Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 05/10] add permit user mounts in new namespace clone flag
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Miklos Szeredi ([EMAIL PROTECTED]): Given the existence of shared subtrees allowing/denying this at the mount namespace level is silly and wrong. If we need more than just the filesystem permission checks can we make it a mount flag settable with mount and remount that allows non-privileged users the ability to create mount points under it in directories they have full read/write access to. OK, that makes sense. I don't like the use of clone flags for this purpose but in this case the shared subtress are a much more fundamental reasons for not doing this at the namespace level. I'll drop the clone flag, and add a mount flag instead. Thanks, Miklos Makes sense, so then on login pam has to spawn a new user namespace and construct a root fs with no shared subtrees and with the user-mounts-allowed flag specified? I was expecting the usage in the normal case to be the Al Viro style with shared subtrees setup for each user, with the shared subtree marked with user-mounts-allowed. Then on login pam would unshare the namespace and restrict the user to their specific portion of the shared subtree. If you don't use multiple mount namespaces all of the users have to agree on what they want the non-privileged part of the namespace to look like. If you don't use shared subtrees you have to deal with all of the joys of implementing enter, or else multiple logins from the same user have problems. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 05/10] add permit user mounts in new namespace clone flag
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Miklos Szeredi ([EMAIL PROTECTED]): From: Miklos Szeredi [EMAIL PROTECTED] If CLONE_NEWNS and CLONE_NEWNS_USERMNT are given to clone(2) or unshare(2), then allow user mounts within the new namespace. This is not flexible enough, because user mounts can't be enabled for the initial namespace. The remaining clone bits also getting dangerously few... Alternatives are: - prctl() flag - setting through the containers filesystem Sorry, I know I had mentioned it, but this is definately my least favorite approach. Curious whether are any other suggestions/opinions from the containers list? Given the existence of shared subtrees allowing/denying this at the mount namespace level is silly and wrong. If we need more than just the filesystem permission checks can we make it a mount flag settable with mount and remount that allows non-privileged users the ability to create mount points under it in directories they have full read/write access to. I don't like the use of clone flags for this purpose but in this case the shared subtress are a much more fundamental reasons for not doing this at the namespace level. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Implement shadow directory support for device classes.
Modify the device class code so that normal manipulations work in the presence of shadow directories. Some of the shadow directory support still needs to be implemented in the implementation of the class but these modifications are sufficient to make that simple. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/base/class.c | 34 -- include/linux/device.h |4 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/base/class.c b/drivers/base/class.c index 8bf2ca2..f72ddc0 100644 --- a/drivers/base/class.c +++ b/drivers/base/class.c @@ -136,6 +136,18 @@ static void remove_class_attrs(struct class * cls) } } +static int class_setup_shadow(struct class *cls) +{ + if (!cls-class_device_dparent !cls-class_follow_link) + return 0; + + if (!cls-class_device_dparent || !cls-class_device_dparent) + return -EINVAL; + + return sysfs_make_shadowed_dir(cls-subsys.kset.kobj, + cls-class_follow_link); +} + int class_register(struct class * cls) { int error; @@ -154,6 +166,11 @@ int class_register(struct class * cls) error = subsystem_register(cls-subsys); if (!error) { + error = class_setup_shadow(cls); + if (error) + subsystem_unregister(cls-subsys); + } + if (!error) { error = add_class_attrs(class_get(cls)); class_put(cls); } @@ -582,6 +599,7 @@ int class_device_add(struct class_device *class_dev) struct class *parent_class = NULL; struct class_device *parent_class_dev = NULL; struct class_interface *class_intf; + struct dentry *dparent; int error = -EINVAL; class_dev = class_device_get(class_dev); @@ -610,7 +628,11 @@ int class_device_add(struct class_device *class_dev) else class_dev-kobj.parent = parent_class-subsys.kset.kobj; - error = kobject_add(class_dev-kobj); + if (parent_class-class_device_dparent) + dparent = parent_class-class_device_dparent(class_dev); + else + dparent = parent_class-subsys.kset.kobj.dentry; + error = kobject_shadow_add(class_dev-kobj, dparent); if (error) goto out2; @@ -841,11 +863,15 @@ int class_device_rename(struct class_device *class_dev, char *new_name) { int error = 0; char *old_class_name = NULL, *new_class_name = NULL; + struct class *class; + struct dentry *new_parent; class_dev = class_device_get(class_dev); if (!class_dev) return -EINVAL; + class = class_dev-class; + pr_debug(CLASS: renaming '%s' to '%s'\n, class_dev-class_id, new_name); @@ -857,7 +883,11 @@ int class_device_rename(struct class_device *class_dev, char *new_name) strlcpy(class_dev-class_id, new_name, KOBJ_NAME_LEN); - error = kobject_rename(class_dev-kobj, new_name); + if (class-class_device_dparent) + new_parent = class-class_device_dparent(class_dev); + else + new_parent = class-subsys.kset.kobj.dentry; + error = kobject_shadow_rename(class_dev-kobj, new_parent, new_name); #ifdef CONFIG_SYSFS_DEPRECATED if (class_dev-dev) { diff --git a/include/linux/device.h b/include/linux/device.h index f44247f..162e840 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -33,6 +33,7 @@ struct device; struct device_driver; struct class; struct class_device; +struct nameidata; struct bus_type { const char * name; @@ -197,6 +198,9 @@ struct class { int (*suspend)(struct device *, pm_message_t state); int (*resume)(struct device *); + + struct dentry *(*class_device_dparent)(struct class_device *); + void *(*class_follow_link)(struct dentry *dentry, struct nameidata *nd); }; extern int __must_check class_register(struct class *); -- 1.4.4.1.g278f - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Implement shadow directory support for device classes.
Greg KH [EMAIL PROTECTED] writes: Or you can try it with any of the subsystems that have already been converted over to use the struct device code, like misc, mem, and many others. Just look for the symlinks in /sys/class/CLASS_NAME/ instead of real subdirectories. So I'd really prefer to not apply this patch, and just do this kind of work for 'struct device' instead. I don't want to get stuck waiting for cleanups so I will keep this one, even if you don't apply it :) But I will go figure out the equivalent version for struct device, as it doesn't look like you are that far off from merging that version. :) That way my code will work whichever code base I have to merge against. Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox [EMAIL PROTECTED] writes: On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. Currently that doesn't solve the problem as block_nr is held in an int. And as gcc compiles an int to a 32bit number on a 64bit processor, the problem still isn't solved. That at least we need to address. Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: 64-bit block sizes on 32-bit systems
LA Walsh [EMAIL PROTECTED] writes: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. With classic 512 byte sectors the top size is right about 2TB. The basic thought is that 64bit numbers tend to suck, so we don't want then in any fast paths on a 32bit system. However, isn't it possible there will continue to be a series of P-IV,V,VI,VII ...etc, addons that will be used for sometime to come. I've even heard it suggested that we might see 2 or more CPU's on a single chip as a way to increase cpu capacity w/o driving up clock speed. Given the cheapness of .25T drives now, seeing the possibility of 4T drives doesn't seem that remote (maybe 5 years?). Side question: does the 32-bit block size limit also apply to RAID disks or does it use a different block-nr type? For now yes it does. So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would lose no efficiency. Drivers that couldn't be or hadn't been ported to use 'block_nr' could default to being disabled if 64-bit blocks were selected, etc. So has this idea been tossed about and or previously thrashed? Using a 64bit number of 32bit systems has so far been trashed. Though this does look like a real problem that needs to be solved at some point. I doubt we can wait past 2.5 though if we want the code ready when the hardware is. Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: [RFC] sane access to per-fs metadata (was Re: [PATCH] Documentation/ioctl-number.txt)
Alexander Viro [EMAIL PROTECTED] writes: IOW, you can get normal filesystem view (meaning that you have all usual UNIX toolkit available) for per-fs control stuff. And keep the ability to do proper locking - it's the same driver that handles the main fs and you have access to superblock. No need to change the API - everything is already there... I'll post an example patch for ext2 (safe access to superblock, group descriptors, inode table and bitmaps on a live fs) after this weekend (== when misc shit will somewhat settle down). Cheers, Al PS: Folks[1], I hope it explains why I'm very sceptical about "let's add new A{B,P}I" sort of ideas - approach above can be used for almost all stuff I've seen proposed. You can have multiple views of the same object. And have all of them available via normal API. This is a cool idea. But I couple of places where this might fall down. 1) If a filesystem has multiple name spaces and we use different mounts to handle them, will this break anything? Fat32 with it's short and long names, and the Novell filesystem are the examples I can think of. 2) An API is still being developed it just uses the existing infrastructure which is good, but we still need to standardize what is exported. It's a cleaner way to build a new API but a new API is being built. 3) What is a safe way to do this so a non-root user can call mount? 4) What is appropriate way using open,read,write,close,mount to handle stat data and extended attributes. The stat data is the big one because it is used so frequently. Possibly a mountopenread/writecloseumount syscall is needed. I keep thinking open("/path/to/file/stat_data") but that feels excessively heavy at the API level. But if we involve mount (at least semantically) that could work for directories as well. The goal here is to push your ideas to the limits so we can where using ioctl or new a syscall is appropriate. If indeed there is such a case. Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: [patch] O_SYNC patch 3/3, add inode dirty buffer list support to ext2
"Jeff V. Merkey" [EMAIL PROTECTED] writes: Cool. ORACLE is going to **SMOKE** on EXT2 with this change. Pessimism Hmm I don't see how ORACLE is going to **SMOKE**. Last I looked ORACLE would need a query optimizer that always would find the best possible index and much less overhead to **SMOKE**. Last I looked table reads were 10x slower than file reads. /Pessimism Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]