Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
On Thu, May 10, 2007 at 10:59:26AM +1000, David Chinner wrote: On Wed, May 09, 2007 at 09:31:02PM +0530, Amit K. Arora wrote: I have the updated patches ready which take care of Andrew's comments. Will run some tests and post them soon. But, before submitting these patches, I think it will be better to finalize on certain things which might be worth some discussion here: 1) Should the file size change when preallocation is done beyond EOF ? - Andreas and Chris Wedgwood are in favor of not changing the file size in this case. I also tend to agree with them. Does anyone has an argument in favor of changing the filesize ? If not, I will remove the code which changes the filesize, before I resubmit the concerned ext4 patch. I think there needs to be both. If we don't have a mechanism to atomically change the file size with the preallocation, then applications that use stat() to work out if they need to preallocate more space will end up racing. By both above, do you mean we should give user the flexibility if it wants the filesize changed or not ? It can be done by having *two* modes for preallocation in the system call - say FA_PREALLOCATE and FA_ALLOCATE. If we use FA_PREALLOCATE mode, fallocate() will allocate blocks, but will not change the filesize and [cm]time. If FA_ALLOCATE mode is used, fallocate() will change the filesize if required (i.e. when allocation is beyond EOF) and also update [cm]time. This way, the application can decide what it wants. This will be helpfull for the partial allocation scenario also. Think of the case when we do not change the filesize in fallocate() and expect applications/posix_fallocate() to do ftruncate() after fallocate() for this. Now if fallocate() results in a partial allocation with -ENOSPC error returned, applications/posix_fallocate() will not know for what length ftruncate() has to be called. :( Hence it may be a good idea to give user the flexibility if it wants to atomically change the file size with preallocation or not. But, with more flexibility there comes inconsistency in behavior, which is worth considering. 2) For FA_UNALLOCATE mode, should the file system allow unallocation of normal (non-preallocated) blocks (blocks allocated via regular write/truncate operations) also (i.e. work as punch()) ? Yes. That is the current XFS implementation for XFS_IOC_UNRESVSP, and what i did for FA_UNALLOCATE as well. Ok. But, some people may not expect/like this. I think, we can keep it on the backburner for a while, till other issues are sorted out. - Though FA_UNALLOCATE mode is yet to be implemented on ext4, still we need to finalize on the convention here as a general guideline to all the filesystems that implement fallocate. 3) If above is true, the file size will need to be changed for unallocation when block holding the EOF gets unallocated. No - we punch a hole. If you want the filesize to change, then you use ftruncate() to remove the blocks at EOF and change the file size atomically. Ok. 4) Should we update mtime ctime on a successfull allocation/ unallocation ? - David Chinner raised this question in following post: http://lkml.org/lkml/2007/4/29/407 I think it makes sense to update the [mc]time for a successfull preallocation/unallocation. Does anyone feel otherwise ? It will be interesting to know how XFS behaves currently. Does XFS update [mc]time for preallocation ? No, XFS does *not* update a/m/ctime on prealloc/punch unless the file size changes. If the filesize changes, it behaves exactly the same way that ftruncate() behaves. Having additional mode (of FA_PREALLOCATE) might help here too. Please see above. -- Regards, Amit Arora - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] AFS: Fix interminable loop in afs_write_back_from_locked_page()
Following bug was uncovered by compiling with '-W' flag: CC [M] fs/afs/write.o fs/afs/write.c: In function ‘afs_write_back_from_locked_page’: fs/afs/write.c:398: warning: comparison of unsigned expression = 0 is always true Loop variable 'n' is unsigned, so wraps around happily as far as I can see. Trival fix attached (compile tested only). Signed-Off-By: Mika Kukkonen [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/afs/write.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/afs/write.c b/fs/afs/write.c index 67ae4db..28f3751 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -395,8 +395,9 @@ static int afs_write_back_from_locked_page(struct afs_writeback *wb, if (n == 0) goto no_more; if (pages[0]-index != start) { - for (n--; n = 0; n--) - put_page(pages[n]); + do { + put_page(pages[--n]); + } while (n 0); goto no_more; } - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] AFS: Fix a couple of problems with unlinking AFS files
Fix a couple of problems with unlinking AFS files. (1) The parent directory wasn't being updated properly between unlink() and the following lookup(). It seems that, for some reason, invalidate_remote_inode() wasn't discarding the directory contents correctly, so this patch calls invalidate_inode_pages2() instead on non-regular files. (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a source server recorded without oopsing. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/afs/file.c |2 +- fs/afs/inode.c | 10 +++--- fs/afs/super.c |3 ++- fs/afs/vnode.c | 33 + 4 files changed, 31 insertions(+), 17 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index 3e25795..9c0e721 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -236,7 +236,7 @@ static void afs_invalidatepage(struct page *page, unsigned long offset) { int ret = 1; - kenter({%lu},%lu, page-index, offset); + _enter({%lu},%lu, page-index, offset); BUG_ON(!PageLocked(page)); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 515a5d1..47f5fed 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -209,11 +209,15 @@ bad_inode: */ void afs_zap_data(struct afs_vnode *vnode) { - _enter(zap data {%x:%u}, vnode-fid.vid, vnode-fid.vnode); + _enter({%x:%u}, vnode-fid.vid, vnode-fid.vnode); /* nuke all the non-dirty pages that aren't locked, mapped or being -* written back */ - invalidate_remote_inode(vnode-vfs_inode); +* written back in a regular file and completely discard the pages in a +* directory or symlink */ + if (S_ISREG(vnode-vfs_inode.i_mode)) + invalidate_remote_inode(vnode-vfs_inode); + else + invalidate_inode_pages2(vnode-vfs_inode.i_mapping); } /* diff --git a/fs/afs/super.c b/fs/afs/super.c index d24be33..422f532 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -488,6 +488,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb) vnode-flags= 1 AFS_VNODE_UNSET; vnode-cb_promised = false; + _leave( = %p, vnode-vfs_inode); return vnode-vfs_inode; } @@ -498,7 +499,7 @@ static void afs_destroy_inode(struct inode *inode) { struct afs_vnode *vnode = AFS_FS_I(inode); - _enter({%lu}, inode-i_ino); + _enter(%p{%x:%u}, inode, vnode-fid.vid, vnode-fid.vnode); _debug(DESTROY INODE %p, inode); diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c index ec81466..bea8bd9 100644 --- a/fs/afs/vnode.c +++ b/fs/afs/vnode.c @@ -175,24 +175,33 @@ static void afs_vnode_deleted_remotely(struct afs_vnode *vnode) { struct afs_server *server; + _enter({%p}, vnode-server); + set_bit(AFS_VNODE_DELETED, vnode-flags); server = vnode-server; - if (vnode-cb_promised) { - spin_lock(server-cb_lock); + if (server) { if (vnode-cb_promised) { - rb_erase(vnode-cb_promise, server-cb_promises); - vnode-cb_promised = false; + spin_lock(server-cb_lock); + if (vnode-cb_promised) { + rb_erase(vnode-cb_promise, +server-cb_promises); + vnode-cb_promised = false; + } + spin_unlock(server-cb_lock); } - spin_unlock(server-cb_lock); - } - spin_lock(vnode-server-fs_lock); - rb_erase(vnode-server_rb, vnode-server-fs_vnodes); - spin_unlock(vnode-server-fs_lock); + spin_lock(server-fs_lock); + rb_erase(vnode-server_rb, server-fs_vnodes); + spin_unlock(server-fs_lock); - vnode-server = NULL; - afs_put_server(server); + vnode-server = NULL; + afs_put_server(server); + } else { + ASSERT(!vnode-cb_promised); + } + + _leave(); } /* @@ -225,7 +234,7 @@ void afs_vnode_finalise_status_update(struct afs_vnode *vnode, */ static void afs_vnode_status_update_failed(struct afs_vnode *vnode, int ret) { - _enter(%p,%d, vnode, ret); + _enter({%x:%u},%d, vnode-fid.vid, vnode-fid.vnode, ret); spin_lock(vnode-lock); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] file capabilities: accomodate 32 bit capabilities
On May 08, 2007 16:49 -0500, Serge E. Hallyn wrote: Quoting Andreas Dilger ([EMAIL PROTECTED]): One of the important use cases I can see today is the ability to split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine grained attributes. Sounds plausible, though it suffers from both making capabilities far more cumbersome (i.e. finding the right capability for what you wanted to do) and backward compatibility. Perhaps at that point we should introduce security.capabilityv2 xattrs. A binary can then carry security.capability=CAP_SYS_ADMIN=p, and security.capabilityv2=cap_may_clone_mntns=p. Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing 12 bytes worth of data, so it is probably just better to keep extending the original capability fields as was in the proposal. What we definitely do NOT want to happen is an application that needs priviledged access (e.g. e2fsck, mount) to stop running because the new capabilities _would_ have been granted by the new kernel and are not by the old kernel and STRICTXATTR is used. To me it would seem that having extra capabilities on an old kernel is relatively harmless if the old kernel doesn't know what they are. It's like having a key to a door that you don't know where it is. If we ditch the STRICTXATTR option do the semantics seem sane to you? Seems reasonable. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html