[PATCH 0/7] fix setuid/setgid clearing in networked filesystems (take 5)
When an unprivileged process attempts to modify a file that has the setuid or setgid bits set, the VFS will attempt to clear these bits. The VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call notify_change to clear these bits and set the mode accordingly. With a networked filesystem (NFS and CIFS in particular but likely others), the client machine or process may not have credentials that allow for setting the mode. In some situations, this can lead to file corruption, an operation failing outright because the setattr fails, or to races that lead to a mode change being reverted. In this situation, we'd like to just leave the handling of this to the server and ignore these bits. The problem is that by the time the setattr op is called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode change and the setattr operation has no way to know its intent. The following patchset fixes this by making notify_change no longer clear the ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off to the setattr inode op. setattr can then check for the presence of these bits, and if they're set it can assume that the mode change was only for the purposes of clearing these bits. This means that we now have an implicit assumption that notify_change is never called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently enforces that, so the first patch also adds a BUG_ON() if that occurs. The next two patches fix NFS and CIFS to take advantage of this new scheme to ignore doing the mode change when these flags are set. The last four patches fix up callers of notify_change to make sure that they don't trip the new BUG() call. This patchset should apply cleanly to 2.6.23-rc4-mm1. This is basically the same patchset as take 4 with a few extra patches to fix up the callers of notify_change, and some minor parenthetical cleanups. Signed-off-by: Jeff Layton [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations
Make notify_change not clear the ATTR_KILL_S*ID bits in the ia_vaid that gets passed to the setattr inode operation. This allows the filesystems to reinterpret whether this mode change is simply intended to clear the setuid/setgid bits. This means that notify_change should never be called with both ATTR_MODE and either of the ATTR_KILL_S*ID bits set, since the filesystem would have no way to know what part of the mode change was intentional. If it is called this way, consider it a BUG(). Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/attr.c | 23 +-- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index ae58bd3..6b3b07e 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -103,12 +103,11 @@ EXPORT_SYMBOL(inode_setattr); int notify_change(struct dentry * dentry, struct iattr * attr) { struct inode *inode = dentry-d_inode; - mode_t mode; + mode_t mode = inode-i_mode; int error; struct timespec now; unsigned int ia_valid = attr-ia_valid; - mode = inode-i_mode; now = current_fs_time(inode-i_sb); attr-ia_ctime = now; @@ -125,18 +124,22 @@ int notify_change(struct dentry * dentry, struct iattr * attr) if (error) return error; } + + /* +* It's not valid to pass an iattr with both ATTR_MODE and +* ATTR_KILL_S*ID set. +*/ + if ((ia_valid (ATTR_KILL_SUID|ATTR_KILL_SGID)) + (ia_valid ATTR_MODE)) + BUG(); + if (ia_valid ATTR_KILL_SUID) { - attr-ia_valid = ~ATTR_KILL_SUID; if (mode S_ISUID) { - if (!(ia_valid ATTR_MODE)) { - ia_valid = attr-ia_valid |= ATTR_MODE; - attr-ia_mode = inode-i_mode; - } - attr-ia_mode = ~S_ISUID; + ia_valid = attr-ia_valid |= ATTR_MODE; + attr-ia_mode = (inode-i_mode ~S_ISUID); } } if (ia_valid ATTR_KILL_SGID) { - attr-ia_valid = ~ ATTR_KILL_SGID; if ((mode (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { if (!(ia_valid ATTR_MODE)) { ia_valid = attr-ia_valid |= ATTR_MODE; @@ -145,7 +148,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr) attr-ia_mode = ~S_ISGID; } } - if (!attr-ia_valid) + if (!(attr-ia_valid ~(ATTR_KILL_SUID | ATTR_KILL_SGID))) return 0; if (ia_valid ATTR_SIZE) -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] NFS: if ATTR_KILL_S*ID bits are set, then skip mode change
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS skip the mode change and let the server handle it. Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/nfs/inode.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 45633f9..441bd8b 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -327,6 +327,10 @@ nfs_setattr(struct dentry *dentry, struct iattr *attr) nfs_inc_stats(inode, NFSIOS_VFSSETATTR); + /* skip mode change if it's just for clearing setuid/setgid */ + if (attr-ia_valid (ATTR_KILL_SUID | ATTR_KILL_SGID)) + attr-ia_valid = ~ATTR_MODE; + if (attr-ia_valid ATTR_SIZE) { if (!S_ISREG(inode-i_mode) || attr-ia_size == i_size_read(inode)) attr-ia_valid = ~ATTR_SIZE; -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] CIFS: ignore mode change if it's just for clearing setuid/setgid bits
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/cifs/inode.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 66436f5..8fa3d63 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1547,6 +1547,11 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs) } time_buf.Attributes = 0; + + /* skip mode change if it's just for clearing setuid/setgid */ + if (attrs-ia_valid (ATTR_KILL_SUID|ATTR_KILL_SGID)) + attrs-ia_valid = ~ATTR_MODE; + if (attrs-ia_valid ATTR_MODE) { cFYI(1, (Mode changed to 0x%x, attrs-ia_mode)); mode = attrs-ia_mode; -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] ecryptfs: allow lower fs to interpret ATTR_KILL_S*ID
Make sure ecryptfs doesn't trip the BUG() in notify_change. This also allows the lower filesystem to interpret these bits in their own way. Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 131954b..dac4199 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -959,6 +959,14 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia) if (rc 0) goto out; } + + /* +* mode change is for clearing setuid/setgid bits. Allow lower fs +* to interpret this in its own way. +*/ + if (ia-ia_valid (ATTR_KILL_SUID | ATTR_KILL_SGID)) + ia-ia_valid = ~ATTR_MODE; + rc = notify_change(lower_dentry, ia); out: fsstack_copy_attr_all(inode, lower_inode, NULL); -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set
It's theoretically possible for a single SETATTR call to come in that sets the mode and the uid/gid. In that case, assume the mode is correct and don't set the ATTR_KILL_S*ID bits. Doing so would trip the BUG() in notify_change. Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/nfsd/vfs.c | 17 +++-- 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 70f2c86..3b5b8cf 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -369,14 +369,19 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, if (iap-ia_valid ATTR_MODE) { iap-ia_mode = S_IALLUGO; imode = iap-ia_mode |= (imode ~S_IALLUGO); + if ((iap-ia_valid ATTR_UID) iap-ia_uid != inode-i_uid) + iap-ia_valid |= ATTR_KILL_PRIV; + } else { + /* +* Revoke setuid/setgid bit on chown/chgrp, unless the mode +* is being explicitly set +*/ + if ((iap-ia_valid ATTR_UID) iap-ia_uid != inode-i_uid) + iap-ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV; + if ((iap-ia_valid ATTR_GID) iap-ia_gid != inode-i_gid) + iap-ia_valid |= ATTR_KILL_SGID; } - /* Revoke setuid/setgid bit on chown/chgrp */ - if ((iap-ia_valid ATTR_UID) iap-ia_uid != inode-i_uid) - iap-ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV; - if ((iap-ia_valid ATTR_GID) iap-ia_gid != inode-i_gid) - iap-ia_valid |= ATTR_KILL_SGID; - /* Change the attributes. */ iap-ia_valid |= ATTR_CTIME; -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] reiserfs: turn off ATTR_KILL_S*ID at beginning of reiserfs_setattr
reiserfs_setattr can call notify_change recursively using the same iattr struct. This could cause it to trip the BUG() in notify_change. Fix reiserfs to clear those bits near the beginning of the function. Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/reiserfs/inode.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 9ea1200..0804289 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3061,7 +3061,11 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry-d_inode; int error; - unsigned int ia_valid = attr-ia_valid; + unsigned int ia_valid; + + /* must be turned off for recursive notify_change calls */ + ia_valid = attr-ia_valid = ~(ATTR_KILL_SUID|ATTR_KILL_SGID); + reiserfs_write_lock(inode-i_sb); if (attr-ia_valid ATTR_SIZE) { /* version 2 items will be caught by the s_maxbytes check -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] unionfs: fix unionfs_create and unionfs_setattr to handle ATTR_KILL_S*ID
Don't allow either function to trip the BUG() in notify_change. For unionfs_setattr, clear ATTR_MODE if the either ATTR_KILL_S*ID is set. unionfs_create is setting the mode explicitly already. Don't set ATTR_KILL_S*ID. Just fix up the mode to have the same effect. Also, move locking the i_mutex to lower in the function. It's not needed until it checks the i_size. (Jeff Sipek indicated that he was planning to change some of this code, so this patch may need changes if it goes in after his patchset) Signed-off-by: Jeff Layton [EMAIL PROTECTED] --- fs/unionfs/inode.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c index 59bb418..f2e7f25 100644 --- a/fs/unionfs/inode.c +++ b/fs/unionfs/inode.c @@ -78,15 +78,14 @@ static int unionfs_create(struct inode *parent, struct dentry *dentry, struct dentry *lower_dir_dentry; struct iattr newattrs; - mutex_lock(wh_dentry-d_inode-i_mutex); newattrs.ia_valid = ATTR_CTIME | ATTR_MODE | ATTR_ATIME - | ATTR_MTIME | ATTR_UID | ATTR_GID | ATTR_FORCE - | ATTR_KILL_SUID | ATTR_KILL_SGID; + | ATTR_MTIME | ATTR_UID | ATTR_GID | ATTR_FORCE; - newattrs.ia_mode = mode ~current-fs-umask; + newattrs.ia_mode = mode ~(current-fs-umask|S_ISUID|S_ISGID); newattrs.ia_uid = current-fsuid; newattrs.ia_gid = current-fsgid; + mutex_lock(wh_dentry-d_inode-i_mutex); if (wh_dentry-d_inode-i_size != 0) { newattrs.ia_valid |= ATTR_SIZE; newattrs.ia_size = 0; @@ -1032,6 +1031,9 @@ static int unionfs_setattr(struct dentry *dentry, struct iattr *ia) bend = dbend(dentry); inode = dentry-d_inode; + if (ia-ia_valid (ATTR_KILL_SUID | ATTR_KILL_SGID)) + ia-ia_valid = ~ATTR_MODE; + for (bindex = bstart; (bindex = bend) || (bindex == bstart); bindex++) { lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); -- 1.5.2.1 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] System calls for online defrag
On Sep 03, 2007 20:03 +0200, Jan Kara wrote: I've finally got to writing up some proposal how could look system calls allowing for online filesystem defragmentation and generally moving file blocks around for improving performance. Comments are welcome. int sys_movedata(int datafd, int spacefd, loff_t from, size_t len) The call takes blocks used to carry data starting at offset @from of length @len in @spacefd and places them instead of corresponding blocks in @datafd. Calling these @spacefd and @datafd is a bit confusing. How about @srcfd and @tgtfd instead? For defragmentation, are you planning to have @datafd be the real inode and @spacefd be the temporary inode with defragged data, or the reverse? It isn't really clear. Data is copied from @datafd to newly spliced data blocks. If @spacefd contains a hole in the specified interval, a hole is created also in @datafd in the corresponding place. A data block from @spacefd and also replace a hole in @datafd - zeros are copied to such data block. @from and @len should be multiples of filesystem block size (otherwise EINVAL is returned). Data blocks from @datafd in the interval are released, a hole is created in @spacefd. This is mostly clear except the last sentence. I would think that the data blocks in @datafd are kept, getting a copy of the data, while those in @spacefd are released? Another possibility would be to just replace data blocks without any copying of data (that would have to be done by the caller to before calling sys_movedata()). The problem here is how to avoid data loss if someone writes to the file after userspace has copied the data and before sys_movedata() is called. Isn't that true in any case? ssize_t sys_allocate(int fd, int mode, loff_t goal, ssize_t len) Allocate new space to file @fd at offset defined by file position. Both file offset and @len should be a multiple of filesystem block size. The whole interval must not contain any allocated blocks. If the interval extends past EOF, the file size is changed accordingly. @mode defines a way the filesystem will search for blocks. @mode is a bitwise OR of the following flags: ALLOC_FIXED_START - allocation must start at @goal; if not specified, @goal is just a hint where to start an allocation ALLOC_FIXED_LEN - allocate exactly space for @len; if not specified, upto @len bytes may be allocated. ALLOC_CONTINGUOUS - allocation must be one continguous run of blocks How is this much different than sys_fallocate()? int sys_get_free_blocks(const char *fs, loff_t start, loff_t end, int count, struct alloc_extent *space) One alternate possibility is to call the proposed FIEMAP on the block device, to return lists of free/used extents? We have a version of that patch for ext4 and integration into filefrag, so it would be nice to avoid making up yet another API/tool if that one is sufficient. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html