Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Fri, Apr 27, 2007 at 12:53:34PM +0200, J??rn Engel wrote: > > All this would get easier if continuation inodes were known to be rare. > You can ditch the doubly-linked list in favor of a pointer to the main > inode then - traversing the list again is cheap, after all. And you can > just try to read the same block once for every continuation inode. > > If those lists can get long and you need a mapping from offset to > continuation inode on the medium, you are basically fscked. Storing the > mapping requires space. You need the mapping only when space (in some > chunk) gets tight and you allocate continuation inodes. So either you > don't need the mapping or you don't have a good place to put it. Any mapping structure will have to be pre-allocated. > Having a mapping in memory is also questionable. Either you scan the > whole file on first access and spend a long time for large files. Or > you create the mapping on the fly. In that case the page cache will > already give you a 90% solution for free. So in my secret heart of hearts, I do indeed hope that cnodes are rare enough that we don't actually have to do anything smart to make them go fast. Either having no fast lookup structure or creating it in memory as needed would be the nicest solution. However, since I can't guarantee this will be the case, it's nice to have some idea of what we'll do if this does become important. > You should spend a lot of effort trying to minimize cnodes. ;) Yep. It's much better to optimize away most cnodes instead of trying to make the go fast. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fallocate system call
On Fri, Apr 27, 2007 at 07:46:13PM +0200, Heiko Carstens wrote: > If one insists to have fd at first argument, what is wrong with > having u32 arguments only? Well, I was one of those who objected as it seems *UGLY* to me. > It's not that this syscall comes even close to what can be > considered performance critical... Right. > It adds userspace overhead for one architecture. Every *trace and > *libc needs special handling on s390 for this syscall. I would > prefer to avoid this. I'm not that bothered about it. I would prefer it did use clean 64-bit arguments, but given it's a non-critical syscall I'm don't think the aesthetics are worth impossing crud on s390 for. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Apr 26, 2007 21:29 +0200, Jan Kara wrote: > I've been lately playing with remapping ext2/ext3 blocks (especially how > much it can give us in terms of speed of things like KDE start). For that > I've written two simple tools (you can get them from > ftp.suse.com/pub/people/jack/ext3remapper.tar.gz): > e2block2file to transform (preparsed) output from blktrace into a list > of accessed files and offsets accessed > e2remapblocks to use output from e2block2file and remap blocks into big > chunks in the order in which they were accessed. Does it map the whole file contiguously, or does it interleave blocks of the file in the order they are accessed? I would hope that it maps the whole file contiguously, and let readahead work properly to fetch the whole file. Also, keeping the file contiguous avoids fragmentation later if that file is updated, deleted, etc, and conflicts with allocator/defrag/etc. > (see README in the tools archive for more details) > > So far the tools (especially e2remapblocks ;) work on unmounted > filesystem. The ultimate goal is to be able to do similar things for > mounted filesystems but I wanted to see whether block remapping is worth it > and what kernel interfaces would be useful for achieving the goal. I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that matter, this is also very similar to the block-mapped -> extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). > BTW, the results for KDE startup are as follows: > The root partition was about 4.8 GB with around 1 GB free. System has > 1GB mem. All measurements (except for warmcache) were performed after > sync; echo 3 >/proc/sys/vm/drop_caches > > Ordinary start: 19.2 20.3 19.5 19.8 19.3; avg. 19.62 > Start with all data cached: 7 7.6 7.3 7.1 7.1; avg. 7.22 > Start with fcache (see thread http://lkml.org/lkml/2006/5/15/46 for details > on fcache): > 11.3 11 10.3 10.8 10.6; avg. 10.8 > Start with blocks remapped with e2remapblocks: > 13.5 15 13 14.5 14.5; avg. 14.1 > (after remapping, data was stored in 20 continguous extents on disk) Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fallocate system call
On Fri, Apr 27, 2007 at 04:43:28PM +0200, Jörn Engel wrote: > On Fri, 27 April 2007 14:10:03 +0200, Heiko Carstens wrote: > > > > After long discussions where at least two possible implementations > > were suggested that would work on _all_ architectures you chose one > > which doesn't and causes extra effort. > > I believe the long discussion also showed that every possible > implementation has drawbacks. To me this one appeared to be the best of > many bad choices. If one insists to have fd at first argument, what is wrong with having u32 arguments only? It's not that this syscall comes even close to what can be considered performance critical... > Is this implementation worse than we thought? It adds userspace overhead for one architecture. Every *trace and *libc needs special handling on s390 for this syscall. I would prefer to avoid this. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 37/44] hostfs convert to new aops
On Tue, Apr 24, 2007 at 11:24:23AM +1000, Nick Piggin wrote: > This also gets rid of a lot of useless read_file stuff. And also > optimises the full page write case by marking a !uptodate page uptodate. > > Cc: Jeff Dike <[EMAIL PROTECTED]> > Cc: Linux Filesystems > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Looks good. Acked-by: Jeff Dike <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote: > Here's an example, spelled out: > > Allocate file 1 in chunk A. > Grow file 1. > Chunk A fills up. > Allocate continuation inode for file 1 in chunk B. > Chunk A gets some free space. > Chunk B fills up. > Pick chunk A for allocating next block of file 1. > Try to look up a continuation inode for file 1 in chunk A. > Continuation inode for file 1 found in chunk A! > Attach newly allocated block to existing inode for file 1 in chunk A. So far, so good (and the slides are helpful, tx!). What happens when file 1 keeps growing and chunk A fills up (and chunk B is still full)? Can the same continuation inode also point at chunk C, where the file is going to grow to? Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fallocate system call
On Fri, 27 April 2007 14:10:03 +0200, Heiko Carstens wrote: > > After long discussions where at least two possible implementations > were suggested that would work on _all_ architectures you chose one > which doesn't and causes extra effort. I believe the long discussion also showed that every possible implementation has drawbacks. To me this one appeared to be the best of many bad choices. Is this implementation worse than we thought? Jörn -- The grand essentials of happiness are: something to do, something to love, and something to hope for. -- Allan K. Chalmers - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 05/10] unprivileged mounts: allow unprivileged bind mounts
From: Miklos Szeredi <[EMAIL PROTECTED]> Allow bind mounts to unprivileged users if the following conditions are met: - mountpoint is not a symlink - parent mount is owned by the user - the number of user mounts is below the maximum Unprivileged mounts imply MS_SETUSER, and will also have the "nosuid" and "nodev" mount flags set. In particular, if mounting process doesn't have CAP_SETUID capability, then the "nosuid" flag will be added, and if it doesn't have CAP_MKNOD capability, then the "nodev" flag will be added. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:18:46.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:30:04.0 +0200 @@ -237,11 +237,34 @@ static void dec_nr_user_mounts(void) spin_unlock(&vfsmount_lock); } -static void set_mnt_user(struct vfsmount *mnt) +static int reserve_user_mount(void) +{ + int err = 0; + + spin_lock(&vfsmount_lock); + if (nr_user_mounts >= max_user_mounts && !capable(CAP_SYS_ADMIN)) + err = -EPERM; + else + nr_user_mounts++; + spin_unlock(&vfsmount_lock); + return err; +} + +static void __set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt->mnt_flags & MNT_USER); mnt->mnt_uid = current->fsuid; mnt->mnt_flags |= MNT_USER; + + if (!capable(CAP_SETUID)) + mnt->mnt_flags |= MNT_NOSUID; + if (!capable(CAP_MKNOD)) + mnt->mnt_flags |= MNT_NODEV; +} + +static void set_mnt_user(struct vfsmount *mnt) +{ + __set_mnt_user(mnt); spin_lock(&vfsmount_lock); nr_user_mounts++; spin_unlock(&vfsmount_lock); @@ -260,10 +283,16 @@ static struct vfsmount *clone_mnt(struct int flag) { struct super_block *sb = old->mnt_sb; - struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname); + struct vfsmount *mnt; + if (flag & CL_SETUSER) { + int err = reserve_user_mount(); + if (err) + return ERR_PTR(err); + } + mnt = alloc_vfsmnt(old->mnt_devname); if (!mnt) - return ERR_PTR(-ENOMEM); + goto alloc_failed; mnt->mnt_flags = old->mnt_flags; atomic_inc(&sb->s_active); @@ -275,7 +304,7 @@ static struct vfsmount *clone_mnt(struct /* don't copy the MNT_USER flag */ mnt->mnt_flags &= ~MNT_USER; if (flag & CL_SETUSER) - set_mnt_user(mnt); + __set_mnt_user(mnt); if (flag & CL_SLAVE) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); @@ -300,6 +329,11 @@ static struct vfsmount *clone_mnt(struct spin_unlock(&vfsmount_lock); } return mnt; + + alloc_failed: + if (flag & CL_SETUSER) + dec_nr_user_mounts(); + return ERR_PTR(-ENOMEM); } static inline void __mntput(struct vfsmount *mnt) @@ -748,22 +782,26 @@ asmlinkage long sys_oldumount(char __use #endif -static int mount_is_safe(struct nameidata *nd) +/* + * Conditions for unprivileged mounts are: + * - mountpoint is not a symlink + * - mountpoint is in a mount owned by the user + */ +static bool permit_mount(struct nameidata *nd, int *flags) { + struct inode *inode = nd->dentry->d_inode; + if (capable(CAP_SYS_ADMIN)) - return 0; - return -EPERM; -#ifdef notyet - if (S_ISLNK(nd->dentry->d_inode->i_mode)) - return -EPERM; - if (nd->dentry->d_inode->i_mode & S_ISVTX) { - if (current->uid != nd->dentry->d_inode->i_uid) - return -EPERM; - } - if (vfs_permission(nd, MAY_WRITE)) - return -EPERM; - return 0; -#endif + return true; + + if (S_ISLNK(inode->i_mode)) + return false; + + if (!is_mount_owner(nd->mnt, current->fsuid)) + return false; + + *flags |= MS_SETUSER; + return true; } static int lives_below_in_same_fs(struct dentry *d, struct dentry *dentry) @@ -987,9 +1025,10 @@ static int do_loopback(struct nameidata int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; - int err = mount_is_safe(nd); - if (err) - return err; + int err; + + if (!permit_mount(nd, &flags)) + return -EPERM; if (!old_name || !*old_name) return -EINVAL; err = path_lookup(old_name, LOOKUP_FOLLOW, &old_nd); -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 08/10] unprivileged mounts: allow unprivileged fuse mounts
From: Miklos Szeredi <[EMAIL PROTECTED]> Use FS_SAFE for "fuse" fs type, but not for "fuseblk". FUSE was designed from the beginning to be safe for unprivileged users. This has also been verified in practice over many years. In addition unprivileged mounts require the parent mount to be owned by the user, which is more strict than the current userspace policy. This will enable future installations to remove the suid-root fusermount utility. Don't require the "user_id=" and "group_id=" options for unprivileged mounts, but if they are present, verify them for sanity. Disallow the "allow_other" option for unprivileged mounts. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/fuse/inode.c === --- linux.orig/fs/fuse/inode.c 2007-04-26 13:07:11.0 +0200 +++ linux/fs/fuse/inode.c 2007-04-26 13:07:33.0 +0200 @@ -311,6 +311,19 @@ static int parse_fuse_opt(char *opt, str d->max_read = ~0; d->blksize = 512; + /* +* For unprivileged mounts use current uid/gid. Still allow +* "user_id" and "group_id" options for compatibility, but +* only if they match these values. +*/ + if (!capable(CAP_SYS_ADMIN)) { + d->user_id = current->uid; + d->user_id_present = 1; + d->group_id = current->gid; + d->group_id_present = 1; + + } + while ((p = strsep(&opt, ",")) != NULL) { int token; int value; @@ -339,6 +352,8 @@ static int parse_fuse_opt(char *opt, str case OPT_USER_ID: if (match_int(&args[0], &value)) return 0; + if (d->user_id_present && d->user_id != value) + return 0; d->user_id = value; d->user_id_present = 1; break; @@ -346,6 +361,8 @@ static int parse_fuse_opt(char *opt, str case OPT_GROUP_ID: if (match_int(&args[0], &value)) return 0; + if (d->group_id_present && d->group_id != value) + return 0; d->group_id = value; d->group_id_present = 1; break; @@ -536,6 +553,10 @@ static int fuse_fill_super(struct super_ if (!parse_fuse_opt((char *) data, &d, is_bdev)) return -EINVAL; + /* This is a privileged option */ + if ((d.flags & FUSE_ALLOW_OTHER) && !capable(CAP_SYS_ADMIN)) + return -EPERM; + if (is_bdev) { #ifdef CONFIG_BLOCK if (!sb_set_blocksize(sb, d.blksize)) @@ -639,6 +660,7 @@ static struct file_system_type fuse_fs_t .fs_flags = FS_HAS_SUBTYPE, .get_sb = fuse_get_sb, .kill_sb= kill_anon_super, + .fs_flags = FS_SAFE, }; #ifdef CONFIG_BLOCK -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 00/10] mount ownership and unprivileged mount syscall (v5)
v4 -> v5: - fold back Andrew's changes - fold back my update patch: o use fsuid instead of ruid o allow forced unpriv. unmounts for "safe" filesystems o allow mounting over special files, but not over symlinks o set nosuid and nodev based on lack of specific capability - patch header updates - new patch: on propagation inherit owner from parent - new patch: add "no submounts" mount flag The last two patches are up for discussion. The rest I think is in pretty good shape for merging. If somebody feels otherwise, please complain now. -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 10/10] unprivileged mounts: add "no submounts" flag
From: Miklos Szeredi <[EMAIL PROTECTED]> Add a new mount flag "nomnt", which denies submounts for the owner. This would be useful, if we want to support traditional /etc/fstab based user mounts. In this case mount(8) would still have to be suid-root, to check the mountpoint against the user/users flag in /etc/fstab, but /etc/mtab would no longer be mandatory for storing the actual owner of the mount. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-27 12:57:11.0 +0200 +++ linux/fs/namespace.c2007-04-27 12:57:14.0 +0200 @@ -449,6 +449,7 @@ static int show_vfsmnt(struct seq_file * { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOMNT, ",nomnt" }, { 0, NULL } }; struct proc_fs_info *fs_infop; @@ -806,6 +807,9 @@ static bool permit_mount(struct nameidat if (S_ISLNK(inode->i_mode)) return false; + if (nd->mnt->mnt_flags & MNT_NOMNT) + return false; + if (!is_mount_owner(nd->mnt, current->fsuid)) return false; @@ -1575,9 +1579,11 @@ long do_mount(char *dev_name, char *dir_ mnt_flags |= MNT_NODIRATIME; if (flags & MS_RELATIME) mnt_flags |= MNT_RELATIME; + if (flags & MS_NOMNT) + mnt_flags |= MNT_NOMNT; flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | - MS_NOATIME | MS_NODIRATIME | MS_RELATIME); + MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_NOMNT); /* ... and get the mountpoint */ retval = path_lookup(dir_name, LOOKUP_FOLLOW, &nd); Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-04-27 12:57:11.0 +0200 +++ linux/include/linux/fs.h2007-04-27 12:57:14.0 +0200 @@ -128,6 +128,7 @@ extern int dir_notify_enable; #define MS_SHARED (1<<20) /* change to shared */ #define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */ #define MS_SETUSER (1<<22) /* set mnt_uid to current user */ +#define MS_NOMNT (1<<23) /* don't allow unprivileged submounts */ #define MS_ACTIVE (1<<30) #define MS_NOUSER (1<<31) Index: linux/include/linux/mount.h === --- linux.orig/include/linux/mount.h2007-04-27 12:57:01.0 +0200 +++ linux/include/linux/mount.h 2007-04-27 12:57:14.0 +0200 @@ -28,6 +28,7 @@ struct mnt_namespace; #define MNT_NOATIME0x08 #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 +#define MNT_NOMNT 0x40 #define MNT_SHRINKABLE 0x100 #define MNT_USER 0x200 -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 02/10] unprivileged mounts: allow unprivileged umount
From: Miklos Szeredi <[EMAIL PROTECTED]> The owner doesn't need sysadmin capabilities to call umount(). Similar behavior as umount(8) on mounts having "user=UID" option in /etc/mtab. The difference is that umount also checks /etc/fstab, presumably to exclude another mount on the same mountpoint. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:10:48.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:16:21.0 +0200 @@ -658,6 +658,27 @@ static int do_umount(struct vfsmount *mn return retval; } +static bool is_mount_owner(struct vfsmount *mnt, uid_t uid) +{ + return (mnt->mnt_flags & MNT_USER) && mnt->mnt_uid == uid; +} + +/* + * umount is permitted for + * - sysadmin + * - mount owner, if not forced umount + */ +static bool permit_umount(struct vfsmount *mnt, int flags) +{ + if (capable(CAP_SYS_ADMIN)) + return true; + + if (flags & MNT_FORCE) + return false; + + return is_mount_owner(mnt, current->fsuid); +} + /* * Now umount can handle mount points as well as block devices. * This is important for filesystems which use unnamed block devices. @@ -681,7 +702,7 @@ asmlinkage long sys_umount(char __user * goto dput_and_out; retval = -EPERM; - if (!capable(CAP_SYS_ADMIN)) + if (!permit_umount(nd.mnt, flags)) goto dput_and_out; retval = do_umount(nd.mnt, flags); -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fallocate system call
On Thu, Apr 26, 2007 at 11:20:56PM +0530, Amit K. Arora wrote: > Based on the discussion, this new patchset uses following as the > interface for fallocate() system call: > > asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len) > > It seems that only s390 architecture has a problem with such a layout of > arguments in fallocate(). Thus for s390, we plan to have a wrapper > (say, sys_s390_fallocate()) for the sys_fallocate(), which will get > called by glibc when an application issues a fallocate() system call > on s390. The s390 arch specific changes will be part of a separate > patch (PATCH 2/5). It will be great if some s390 expert can verify the > patch, since I have not been able to test it on s390 so far. After long discussions where at least two possible implementations were suggested that would work on _all_ architectures you chose one which doesn't and causes extra effort. > It was also noted that minor changes might be required to strace code > to take care of "different arguments on s390" issue. This is not limited to strace... Besides that the s390 backend looks ok. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 01/10] unprivileged mounts: add user mounts to the kernel
From: Miklos Szeredi <[EMAIL PROTECTED]> This patchset adds support for keeping mount ownership information in the kernel, and allow unprivileged mount(2) and umount(2) in certain cases. The mount owner has the following privileges: - unmount the owned mount - create a submount under the owned mount The sysadmin can set the owner explicitly on mount and remount. When an unprivileged user creates a mount, then the owner is automatically set to the user. The following use cases are envisioned: 1) Private namespace, with selected mounts owned by user. E.g. /home/$USER is a good candidate for allowing unpriv mounts and unmounts within. 2) Private namespace, with all mounts owned by user and having the "nosuid" flag. User can mount and umount anywhere within the namespace, but suid programs will not work. 3) Global namespace, with a designated directory, which is a mount owned by the user. E.g. /mnt/users/$USER is set up so that it is bind mounted onto itself, and set to be owned by $USER. The user can add/remove mounts only under this directory. The following extra security measures are taken for unprivileged mounts: - usermounts are limited by a sysctl tunable - force "nosuid,nodev" mount options on the created mount For testing unprivileged mounts (and for other purposes) simple mount/umount utilities are available from: http://www.kernel.org/pub/linux/kernel/people/mszeredi/mmount/ I'll also submit a patch to util-linux, to add the same functionality to mount(8) and umount(8). This patch: A new mount flag, MS_SETUSER is used to make a mount owned by a user. If this flag is specified, then the owner will be set to the current fsuid and the mount will be marked with the MNT_USER flag. On remount don't preserve previous owner, and treat MS_SETUSER as for a new mount. The MS_SETUSER flag is ignored on mount move. The MNT_USER flag is not copied on any kind of mount cloning: namespace creation, binding or propagation. For bind mounts the cloned mount(s) are set to MNT_USER depending on the MS_SETUSER mount flag. In all the other cases MNT_USER is always cleared. For MNT_USER mounts a "user=UID" option is added to /proc/PID/mounts. This is compatible with how mount ownership is stored in /etc/mtab. The rationale for using MS_SETUSER and MNT_USER, to distinguish "user" mounts from "non-user" or "legacy" mounts are follows: a) Mount(2) and umount(2) on legacy mounts always need CAP_SYS_ADMIN capability. As opposed to user mounts, which will only require, that the mount owner matches the current fsuid. So a process with fsuid=0 should not be able to mount/umount legacy mounts without the CAP_SYS_ADMIN capability. b) Legacy userspace programs may set fsuid to nonzero before calling mount(2). In such an unlikely case, this patchset would cause an unintended side effect of making the mount owned by the fsuid. c) For legacy mounts, no "user=UID" option should be shown in /proc/mounts for backwards compatibility. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:08:35.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:10:48.0 +0200 @@ -227,6 +227,13 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void set_mnt_user(struct vfsmount *mnt) +{ + BUG_ON(mnt->mnt_flags & MNT_USER); + mnt->mnt_uid = current->fsuid; + mnt->mnt_flags |= MNT_USER; +} + static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, int flag) { @@ -241,6 +248,11 @@ static struct vfsmount *clone_mnt(struct mnt->mnt_mountpoint = mnt->mnt_root; mnt->mnt_parent = mnt; + /* don't copy the MNT_USER flag */ + mnt->mnt_flags &= ~MNT_USER; + if (flag & CL_SETUSER) + set_mnt_user(mnt); + if (flag & CL_SLAVE) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); mnt->mnt_master = old; @@ -403,6 +415,8 @@ static int show_vfsmnt(struct seq_file * if (mnt->mnt_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } + if (mnt->mnt_flags & MNT_USER) + seq_printf(m, ",user=%i", mnt->mnt_uid); if (mnt->mnt_sb->s_op->show_options) err = mnt->mnt_sb->s_op->show_options(m, mnt); seq_puts(m, " 0 0\n"); @@ -923,8 +937,9 @@ static int do_change_type(struct nameida /* * do loopback mount. */ -static int do_loopback(struct nameidata *nd, char *old_name, int recurse) +static int do_loopback(struct nameidata *nd, char *old_name, int flags) { + int clone_flags; struct nameidata old_nd; struct vfsmount *mnt = NULL; int err
[patch 09/10] unprivileged mounts: propagation: inherit owner from parent
From: Miklos Szeredi <[EMAIL PROTECTED]> On mount propagation, let the owner of the clone be inherited from the parent into which it has been propagated. Also if the parent has the "nosuid" flag, set this flag for the child as well. This makes sense for example, when propagation is set up from the initial namespace into a per-user namespace, where some or all of the mounts may be owned by the user. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-27 12:57:01.0 +0200 +++ linux/fs/namespace.c2007-04-27 12:57:11.0 +0200 @@ -250,10 +250,10 @@ static int reserve_user_mount(void) return err; } -static void __set_mnt_user(struct vfsmount *mnt) +static void __set_mnt_user(struct vfsmount *mnt, uid_t owner) { BUG_ON(mnt->mnt_flags & MNT_USER); - mnt->mnt_uid = current->fsuid; + mnt->mnt_uid = owner; mnt->mnt_flags |= MNT_USER; if (!capable(CAP_SETUID)) @@ -264,7 +264,7 @@ static void __set_mnt_user(struct vfsmou static void set_mnt_user(struct vfsmount *mnt) { - __set_mnt_user(mnt); + __set_mnt_user(mnt, current->fsuid); spin_lock(&vfsmount_lock); nr_user_mounts++; spin_unlock(&vfsmount_lock); @@ -280,7 +280,7 @@ static void clear_mnt_user(struct vfsmou } static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, - int flag) + int flag, uid_t owner) { struct super_block *sb = old->mnt_sb; struct vfsmount *mnt; @@ -304,7 +304,10 @@ static struct vfsmount *clone_mnt(struct /* don't copy the MNT_USER flag */ mnt->mnt_flags &= ~MNT_USER; if (flag & CL_SETUSER) - __set_mnt_user(mnt); + __set_mnt_user(mnt, owner); + + if (flag & CL_NOSUID) + mnt->mnt_flags |= MNT_NOSUID; if (flag & CL_SLAVE) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); @@ -822,7 +825,7 @@ static int lives_below_in_same_fs(struct } struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry, - int flag) + int flag, uid_t owner) { struct vfsmount *res, *p, *q, *r, *s; struct nameidata nd; @@ -830,7 +833,7 @@ struct vfsmount *copy_tree(struct vfsmou if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt)) return ERR_PTR(-EPERM); - res = q = clone_mnt(mnt, dentry, flag); + res = q = clone_mnt(mnt, dentry, flag, owner); if (IS_ERR(q)) goto error; q->mnt_mountpoint = mnt->mnt_mountpoint; @@ -852,7 +855,7 @@ struct vfsmount *copy_tree(struct vfsmou p = s; nd.mnt = q; nd.dentry = p->mnt_mountpoint; - q = clone_mnt(p, p->mnt_root, flag); + q = clone_mnt(p, p->mnt_root, flag, owner); if (IS_ERR(q)) goto error; spin_lock(&vfsmount_lock); @@ -1028,7 +1031,8 @@ static int do_change_type(struct nameida */ static int do_loopback(struct nameidata *nd, char *old_name, int flags) { - int clone_flags; + int clone_flags = 0; + uid_t owner = 0; struct nameidata old_nd; struct vfsmount *mnt = NULL; int err; @@ -1049,11 +1053,15 @@ static int do_loopback(struct nameidata if (!check_mnt(nd->mnt) || !check_mnt(old_nd.mnt)) goto out; - clone_flags = (flags & MS_SETUSER) ? CL_SETUSER : 0; + if (flags & MS_SETUSER) { + clone_flags |= CL_SETUSER; + owner = current->fsuid; + } + if (flags & MS_REC) - mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags); + mnt = copy_tree(old_nd.mnt, old_nd.dentry, clone_flags, owner); else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags); + mnt = clone_mnt(old_nd.mnt, old_nd.dentry, clone_flags, owner); err = PTR_ERR(mnt); if (IS_ERR(mnt)) @@ -1227,7 +1235,7 @@ static int do_new_mount(struct nameidata } if (flags & MS_SETUSER) - __set_mnt_user(mnt); + __set_mnt_user(mnt, current->fsuid); return do_add_mount(mnt, nd, mnt_flags, NULL); @@ -1620,7 +1628,7 @@ static struct mnt_namespace *dup_mnt_ns( down_write(&namespace_sem); /* First pass: copy the tree topology */ new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root, - CL_COPY_ALL | CL_EXPIRE); + CL_COPY_ALL | CL_EXPIRE, 0); if (IS_ERR(new_ns->root)) { up_write(&na
[patch 04/10] unprivileged mounts: propagate error values from clone_mnt
From: Miklos Szeredi <[EMAIL PROTECTED]> Allow clone_mnt() to return errors other than ENOMEM. This will be used for returning a different error value when the number of user mounts goes over the limit. Fix copy_tree() to return EPERM for unbindable mounts. Don't propagate further from dup_mnt_ns() as that copy_tree() can only fail with -ENOMEM. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:17:13.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:18:46.0 +0200 @@ -262,41 +262,42 @@ static struct vfsmount *clone_mnt(struct struct super_block *sb = old->mnt_sb; struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname); - if (mnt) { - mnt->mnt_flags = old->mnt_flags; - atomic_inc(&sb->s_active); - mnt->mnt_sb = sb; - mnt->mnt_root = dget(root); - mnt->mnt_mountpoint = mnt->mnt_root; - mnt->mnt_parent = mnt; - - /* don't copy the MNT_USER flag */ - mnt->mnt_flags &= ~MNT_USER; - if (flag & CL_SETUSER) - set_mnt_user(mnt); - - if (flag & CL_SLAVE) { - list_add(&mnt->mnt_slave, &old->mnt_slave_list); - mnt->mnt_master = old; - CLEAR_MNT_SHARED(mnt); - } else { - if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old)) - list_add(&mnt->mnt_share, &old->mnt_share); - if (IS_MNT_SLAVE(old)) - list_add(&mnt->mnt_slave, &old->mnt_slave); - mnt->mnt_master = old->mnt_master; - } - if (flag & CL_MAKE_SHARED) - set_mnt_shared(mnt); + if (!mnt) + return ERR_PTR(-ENOMEM); - /* stick the duplicate mount on the same expiry list -* as the original if that was on one */ - if (flag & CL_EXPIRE) { - spin_lock(&vfsmount_lock); - if (!list_empty(&old->mnt_expire)) - list_add(&mnt->mnt_expire, &old->mnt_expire); - spin_unlock(&vfsmount_lock); - } + mnt->mnt_flags = old->mnt_flags; + atomic_inc(&sb->s_active); + mnt->mnt_sb = sb; + mnt->mnt_root = dget(root); + mnt->mnt_mountpoint = mnt->mnt_root; + mnt->mnt_parent = mnt; + + /* don't copy the MNT_USER flag */ + mnt->mnt_flags &= ~MNT_USER; + if (flag & CL_SETUSER) + set_mnt_user(mnt); + + if (flag & CL_SLAVE) { + list_add(&mnt->mnt_slave, &old->mnt_slave_list); + mnt->mnt_master = old; + CLEAR_MNT_SHARED(mnt); + } else { + if ((flag & CL_PROPAGATION) || IS_MNT_SHARED(old)) + list_add(&mnt->mnt_share, &old->mnt_share); + if (IS_MNT_SLAVE(old)) + list_add(&mnt->mnt_slave, &old->mnt_slave); + mnt->mnt_master = old->mnt_master; + } + if (flag & CL_MAKE_SHARED) + set_mnt_shared(mnt); + + /* stick the duplicate mount on the same expiry list +* as the original if that was on one */ + if (flag & CL_EXPIRE) { + spin_lock(&vfsmount_lock); + if (!list_empty(&old->mnt_expire)) + list_add(&mnt->mnt_expire, &old->mnt_expire); + spin_unlock(&vfsmount_lock); } return mnt; } @@ -783,11 +784,11 @@ struct vfsmount *copy_tree(struct vfsmou struct nameidata nd; if (!(flag & CL_COPY_ALL) && IS_MNT_UNBINDABLE(mnt)) - return NULL; + return ERR_PTR(-EPERM); res = q = clone_mnt(mnt, dentry, flag); - if (!q) - goto Enomem; + if (IS_ERR(q)) + goto error; q->mnt_mountpoint = mnt->mnt_mountpoint; p = mnt; @@ -808,8 +809,8 @@ struct vfsmount *copy_tree(struct vfsmou nd.mnt = q; nd.dentry = p->mnt_mountpoint; q = clone_mnt(p, p->mnt_root, flag); - if (!q) - goto Enomem; + if (IS_ERR(q)) + goto error; spin_lock(&vfsmount_lock); list_add_tail(&q->mnt_list, &res->mnt_list); attach_mnt(q, &nd); @@ -817,7 +818,7 @@ struct vfsmount *copy_tree(struct vfsmou } } return res; -Enomem: + error: if (res) { LIST_HEAD(umount_list); spin_lock(&vfsmount_lock); @@ -825,7 +826,7 @@ Enomem: sp
[patch 03/10] unprivileged mounts: account user mounts
From: Miklos Szeredi <[EMAIL PROTECTED]> Add sysctl variables for accounting and limiting the number of user mounts. The maximum number of user mounts is set to 1024 by default. This won't in itself enable user mounts, setting a mount to be owned by a user is first needed [akpm] - don't use enumerated sysctls Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/Documentation/filesystems/proc.txt === --- linux.orig/Documentation/filesystems/proc.txt 2007-04-26 13:08:35.0 +0200 +++ linux/Documentation/filesystems/proc.txt2007-04-26 13:17:13.0 +0200 @@ -923,6 +923,15 @@ reaches aio-max-nr then io_setup will fa raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures. +nr_user_mounts and max_user_mounts +-- + +These represent the number of "user" mounts and the maximum number of +"user" mounts respectively. User mounts may be created by +unprivileged users. User mounts may also be created with sysadmin +privileges on behalf of a user, in which case nr_user_mounts may +exceed max_user_mounts. + 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:16:21.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:17:13.0 +0200 @@ -39,6 +39,9 @@ static int hash_mask __read_mostly, hash static struct kmem_cache *mnt_cache __read_mostly; static struct rw_semaphore namespace_sem; +int nr_user_mounts; +int max_user_mounts = 1024; + /* /sys/fs */ decl_subsys(fs, NULL, NULL); EXPORT_SYMBOL_GPL(fs_subsys); @@ -227,11 +230,30 @@ static struct vfsmount *skip_mnt_tree(st return p; } +static void dec_nr_user_mounts(void) +{ + spin_lock(&vfsmount_lock); + nr_user_mounts--; + spin_unlock(&vfsmount_lock); +} + static void set_mnt_user(struct vfsmount *mnt) { BUG_ON(mnt->mnt_flags & MNT_USER); mnt->mnt_uid = current->fsuid; mnt->mnt_flags |= MNT_USER; + spin_lock(&vfsmount_lock); + nr_user_mounts++; + spin_unlock(&vfsmount_lock); +} + +static void clear_mnt_user(struct vfsmount *mnt) +{ + if (mnt->mnt_flags & MNT_USER) { + mnt->mnt_uid = 0; + mnt->mnt_flags &= ~MNT_USER; + dec_nr_user_mounts(); + } } static struct vfsmount *clone_mnt(struct vfsmount *old, struct dentry *root, @@ -283,6 +305,7 @@ static inline void __mntput(struct vfsmo { struct super_block *sb = mnt->mnt_sb; dput(mnt->mnt_root); + clear_mnt_user(mnt); free_vfsmnt(mnt); deactivate_super(sb); } @@ -1028,6 +1051,7 @@ static int do_remount(struct nameidata * down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); if (!err) { + clear_mnt_user(nd->mnt); nd->mnt->mnt_flags = mnt_flags; if (flags & MS_SETUSER) set_mnt_user(nd->mnt); Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-04-26 13:08:36.0 +0200 +++ linux/include/linux/fs.h2007-04-26 13:17:13.0 +0200 @@ -50,6 +50,9 @@ extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; +extern int nr_user_mounts; +extern int max_user_mounts; + #ifdef CONFIG_DNOTIFY extern int dir_notify_enable; #endif Index: linux/kernel/sysctl.c === --- linux.orig/kernel/sysctl.c 2007-04-26 13:08:35.0 +0200 +++ linux/kernel/sysctl.c 2007-04-26 13:17:13.0 +0200 @@ -1064,6 +1064,22 @@ static ctl_table fs_table[] = { #endif #endif { + .ctl_name = CTL_UNNUMBERED, + .procname = "nr_user_mounts", + .data = &nr_user_mounts, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = &proc_dointvec, + }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "max_user_mounts", + .data = &max_user_mounts, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = KERN_SETUID_DUMPABLE, .procname = "suid_dumpable", .data = &suid_dumpable, -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 07/10] unprivileged mounts: allow unprivileged mounts
From: Miklos Szeredi <[EMAIL PROTECTED]> Define a new fs flag FS_SAFE, which denotes, that unprivileged mounting of this filesystem may not constitute a security problem. Since most filesystems haven't been designed with unprivileged mounting in mind, a thorough audit is needed before setting this flag. For "safe" filesystems also allow unprivileged forced unmounting. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namespace.c === --- linux.orig/fs/namespace.c 2007-04-26 13:30:04.0 +0200 +++ linux/fs/namespace.c2007-04-26 13:51:29.0 +0200 @@ -724,14 +724,16 @@ static bool is_mount_owner(struct vfsmou /* * umount is permitted for * - sysadmin - * - mount owner, if not forced umount + * - mount owner + *o if not forced umount, + *o if forced umount, and filesystem is "safe" */ static bool permit_umount(struct vfsmount *mnt, int flags) { if (capable(CAP_SYS_ADMIN)) return true; - if (flags & MNT_FORCE) + if ((flags & MNT_FORCE) && !(mnt->mnt_sb->s_type->fs_flags & FS_SAFE)) return false; return is_mount_owner(mnt, current->fsuid); @@ -787,13 +789,17 @@ asmlinkage long sys_oldumount(char __use * - mountpoint is not a symlink * - mountpoint is in a mount owned by the user */ -static bool permit_mount(struct nameidata *nd, int *flags) +static bool permit_mount(struct nameidata *nd, struct file_system_type *type, +int *flags) { struct inode *inode = nd->dentry->d_inode; if (capable(CAP_SYS_ADMIN)) return true; + if (type && !(type->fs_flags & FS_SAFE)) + return false; + if (S_ISLNK(inode->i_mode)) return false; @@ -1027,7 +1033,7 @@ static int do_loopback(struct nameidata struct vfsmount *mnt = NULL; int err; - if (!permit_mount(nd, &flags)) + if (!permit_mount(nd, NULL, &flags)) return -EPERM; if (!old_name || !*old_name) return -EINVAL; @@ -1188,26 +1194,46 @@ out: * create a new mount for userspace and request it to be added into the * namespace's tree */ -static int do_new_mount(struct nameidata *nd, char *type, int flags, +static int do_new_mount(struct nameidata *nd, char *fstype, int flags, int mnt_flags, char *name, void *data) { + int err; struct vfsmount *mnt; + struct file_system_type *type; - if (!type || !memchr(type, 0, PAGE_SIZE)) + if (!fstype || !memchr(fstype, 0, PAGE_SIZE)) return -EINVAL; - /* we need capabilities... */ - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - - mnt = do_kern_mount(type, flags & ~MS_SETUSER, name, data); - if (IS_ERR(mnt)) + type = get_fs_type(fstype); + if (!type) + return -ENODEV; + + err = -EPERM; + if (!permit_mount(nd, type, &flags)) + goto out_put_filesystem; + + if (flags & MS_SETUSER) { + err = reserve_user_mount(); + if (err) + goto out_put_filesystem; + } + + mnt = vfs_kern_mount(type, flags & ~MS_SETUSER, name, data); + put_filesystem(type); + if (IS_ERR(mnt)) { + if (flags & MS_SETUSER) + dec_nr_user_mounts(); return PTR_ERR(mnt); + } if (flags & MS_SETUSER) - set_mnt_user(mnt); + __set_mnt_user(mnt); return do_add_mount(mnt, nd, mnt_flags, NULL); + + out_put_filesystem: + put_filesystem(type); + return err; } /* @@ -1237,7 +1263,7 @@ int do_add_mount(struct vfsmount *newmnt if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode)) goto unlock; - /* MNT_USER was set earlier */ + /* some flags may have been set earlier */ newmnt->mnt_flags |= mnt_flags; if ((err = graft_tree(newmnt, nd))) goto unlock; Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-04-26 13:46:26.0 +0200 +++ linux/include/linux/fs.h2007-04-26 13:48:14.0 +0200 @@ -96,6 +96,7 @@ extern int dir_notify_enable; #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 +#define FS_SAFE 8 /* Safe to mount by unprivileged users */ #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() * during rename() internally. -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 06/10] unprivileged mounts: put declaration of put_filesystem() in fs.h
From: Miklos Szeredi <[EMAIL PROTECTED]> Declarations go into headers. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/super.c === --- linux.orig/fs/super.c 2007-04-26 13:08:34.0 +0200 +++ linux/fs/super.c2007-04-26 13:46:26.0 +0200 @@ -40,10 +40,6 @@ #include -void get_filesystem(struct file_system_type *fs); -void put_filesystem(struct file_system_type *fs); -struct file_system_type *get_fs_type(const char *name); - LIST_HEAD(super_blocks); DEFINE_SPINLOCK(sb_lock); Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-04-26 13:17:13.0 +0200 +++ linux/include/linux/fs.h2007-04-26 13:46:26.0 +0200 @@ -1919,6 +1919,8 @@ extern int vfs_fstat(unsigned int, struc extern int vfs_ioctl(struct file *, unsigned int, unsigned int, unsigned long); +extern void get_filesystem(struct file_system_type *fs); +extern void put_filesystem(struct file_system_type *fs); extern struct file_system_type *get_fs_type(const char *name); extern struct super_block *get_super(struct block_device *); extern struct super_block *user_get_super(dev_t); -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, 26 April 2007 22:07:43 -0700, Valerie Henson wrote: > > What's important is that each continuation inode have a > back pointer to the parent and that there is some structure for > quickly looking up the continuation inode for a given file offset. > Suggestions for data structures that work well in this situation are > welcome. :) All this would get easier if continuation inodes were known to be rare. You can ditch the doubly-linked list in favor of a pointer to the main inode then - traversing the list again is cheap, after all. And you can just try to read the same block once for every continuation inode. If those lists can get long and you need a mapping from offset to continuation inode on the medium, you are basically fscked. Storing the mapping requires space. You need the mapping only when space (in some chunk) gets tight and you allocate continuation inodes. So either you don't need the mapping or you don't have a good place to put it. Having a mapping in memory is also questionable. Either you scan the whole file on first access and spend a long time for large files. Or you create the mapping on the fly. In that case the page cache will already give you a 90% solution for free. You should spend a lot of effort trying to minimize cnodes. ;) Jörn -- To recognize individual spam features you have to try to get into the mind of the spammer, and frankly I want to spend as little time inside the minds of spammers as possible. -- Paul Graham - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
On Apr 26 2007 22:27, Miklos Szeredi wrote: >> On Apr 25 2007 11:21, Eric W. Biederman wrote: >> >> >> >> Why did we want to use fsuid, exactly? >> > >> >- Because ruid is completely the wrong thing we want mounts owned >> > by whomever's permissions we are using to perform the mount. >> >> Think nfs. I access some nfs file as an unprivileged user. knfsd, by >> nature, would run as euid=0, uid=0, but it needs fsuid=jengelh for >> most permission logic to work as expected. > >I don't think knfsd will ever want to call mount(2). I was actually out at something different... /* Make sure a caller can chown. */ if ((ia_valid & ATTR_UID) && (current->fsuid != inode->i_uid || attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN)) goto error; for example. Using current->[e]uid would not make sense here. >But yeah, I've been convinced, that using fsuid is the right thing to >do. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html