Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-10 Thread Amit K. Arora
On Thu, May 10, 2007 at 10:59:26AM +1000, David Chinner wrote:
 On Wed, May 09, 2007 at 09:31:02PM +0530, Amit K. Arora wrote:
  I have the updated patches ready which take care of Andrew's comments.
  Will run some tests and post them soon.
  
  But, before submitting these patches, I think it will be better to finalize
  on certain things which might be worth some discussion here:
  
  1) Should the file size change when preallocation is done beyond EOF ?
 - Andreas and Chris Wedgwood are in favor of not changing the
   file size in this case. I also tend to agree with them. Does anyone
   has an argument in favor of changing the filesize ?
   If not, I will remove the code which changes the filesize, before I
   resubmit the concerned ext4 patch.
 
 I think there needs to be both. If we don't have a mechanism to
 atomically change the file size with the preallocation, then
 applications that use stat() to work out if they need to preallocate
 more space will end up racing.

By both above, do you mean we should give user the flexibility if it
wants the filesize changed or not ? It can be done by having *two* modes
for preallocation in the system call - say FA_PREALLOCATE and
FA_ALLOCATE. If we use FA_PREALLOCATE mode, fallocate() will allocate
blocks, but will not change the filesize and [cm]time. If FA_ALLOCATE
mode is used, fallocate() will change the filesize if required (i.e.
when allocation is beyond EOF) and also update [cm]time.
This way, the application can decide what it wants.

This will be helpfull for the partial allocation scenario also. Think of
the case when we do not change the filesize in fallocate() and expect
applications/posix_fallocate() to do ftruncate() after fallocate() for
this. Now if fallocate() results in a partial allocation with -ENOSPC
error returned, applications/posix_fallocate() will not know for what
length ftruncate() has to be called.  :(

Hence it may be a good idea to give user the flexibility if it wants to
atomically change the file size with preallocation or not. But, with
more flexibility there comes inconsistency in behavior, which is worth
considering.

 
  2) For FA_UNALLOCATE mode, should the file system allow unallocation
 of normal (non-preallocated) blocks (blocks allocated via
 regular write/truncate operations) also (i.e. work as punch()) ?
 
 Yes. That is the current XFS implementation for XFS_IOC_UNRESVSP, and
 what i did for FA_UNALLOCATE as well.

Ok. But, some people may not expect/like this. I think, we can keep it
on the backburner for a while, till other issues are sorted out.
 
 - Though FA_UNALLOCATE mode is yet to be implemented on ext4, still
   we need to finalize on the convention here as a general guideline
   to all the filesystems that implement fallocate.
  
  3) If above is true, the file size will need to be changed
 for unallocation when block holding the EOF gets unallocated.
 
 No - we punch a hole. If you want the filesize to change, then
 you use ftruncate() to remove the blocks at EOF and change the
 file size atomically.

Ok.
 
  4) Should we update mtime  ctime on a successfull allocation/
 unallocation ?
 - David Chinner raised this question in following post:
   http://lkml.org/lkml/2007/4/29/407
   I think it makes sense to update the [mc]time for a successfull
   preallocation/unallocation. Does anyone feel otherwise ?
   It will be interesting to know how XFS behaves currently. Does XFS
   update [mc]time for preallocation ?
 
 No, XFS does *not* update a/m/ctime on prealloc/punch unless the file size
 changes. If the filesize changes, it behaves exactly the same way that
 ftruncate() behaves.

Having additional mode (of FA_PREALLOCATE) might help here too. Please
see above.

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] AFS: Fix interminable loop in afs_write_back_from_locked_page()

2007-05-10 Thread David Howells
Following bug was uncovered by compiling with '-W' flag:

  CC [M]  fs/afs/write.o
fs/afs/write.c: In function ‘afs_write_back_from_locked_page’:
fs/afs/write.c:398: warning: comparison of unsigned expression = 0 is always 
true

Loop variable 'n' is unsigned, so wraps around happily as far as I can
see. Trival fix attached (compile tested only).

Signed-Off-By: Mika Kukkonen [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/write.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 67ae4db..28f3751 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -395,8 +395,9 @@ static int afs_write_back_from_locked_page(struct 
afs_writeback *wb,
if (n == 0)
goto no_more;
if (pages[0]-index != start) {
-   for (n--; n = 0; n--)
-   put_page(pages[n]);
+   do {
+   put_page(pages[--n]);
+   } while (n  0);
goto no_more;
}
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] AFS: Fix a couple of problems with unlinking AFS files

2007-05-10 Thread David Howells
Fix a couple of problems with unlinking AFS files.

 (1) The parent directory wasn't being updated properly between unlink() and
 the following lookup().

 It seems that, for some reason, invalidate_remote_inode() wasn't
 discarding the directory contents correctly, so this patch calls
 invalidate_inode_pages2() instead on non-regular files.

 (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a
 source server recorded without oopsing.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/file.c  |2 +-
 fs/afs/inode.c |   10 +++---
 fs/afs/super.c |3 ++-
 fs/afs/vnode.c |   33 +
 4 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3e25795..9c0e721 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -236,7 +236,7 @@ static void afs_invalidatepage(struct page *page, unsigned 
long offset)
 {
int ret = 1;
 
-   kenter({%lu},%lu, page-index, offset);
+   _enter({%lu},%lu, page-index, offset);
 
BUG_ON(!PageLocked(page));
 
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 515a5d1..47f5fed 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -209,11 +209,15 @@ bad_inode:
  */
 void afs_zap_data(struct afs_vnode *vnode)
 {
-   _enter(zap data {%x:%u}, vnode-fid.vid, vnode-fid.vnode);
+   _enter({%x:%u}, vnode-fid.vid, vnode-fid.vnode);
 
/* nuke all the non-dirty pages that aren't locked, mapped or being
-* written back */
-   invalidate_remote_inode(vnode-vfs_inode);
+* written back in a regular file and completely discard the pages in a
+* directory or symlink */
+   if (S_ISREG(vnode-vfs_inode.i_mode))
+   invalidate_remote_inode(vnode-vfs_inode);
+   else
+   invalidate_inode_pages2(vnode-vfs_inode.i_mapping);
 }
 
 /*
diff --git a/fs/afs/super.c b/fs/afs/super.c
index d24be33..422f532 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -488,6 +488,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb)
vnode-flags= 1  AFS_VNODE_UNSET;
vnode-cb_promised  = false;
 
+   _leave( = %p, vnode-vfs_inode);
return vnode-vfs_inode;
 }
 
@@ -498,7 +499,7 @@ static void afs_destroy_inode(struct inode *inode)
 {
struct afs_vnode *vnode = AFS_FS_I(inode);
 
-   _enter({%lu}, inode-i_ino);
+   _enter(%p{%x:%u}, inode, vnode-fid.vid, vnode-fid.vnode);
 
_debug(DESTROY INODE %p, inode);
 
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index ec81466..bea8bd9 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -175,24 +175,33 @@ static void afs_vnode_deleted_remotely(struct afs_vnode 
*vnode)
 {
struct afs_server *server;
 
+   _enter({%p}, vnode-server);
+
set_bit(AFS_VNODE_DELETED, vnode-flags);
 
server = vnode-server;
-   if (vnode-cb_promised) {
-   spin_lock(server-cb_lock);
+   if (server) {
if (vnode-cb_promised) {
-   rb_erase(vnode-cb_promise, server-cb_promises);
-   vnode-cb_promised = false;
+   spin_lock(server-cb_lock);
+   if (vnode-cb_promised) {
+   rb_erase(vnode-cb_promise,
+server-cb_promises);
+   vnode-cb_promised = false;
+   }
+   spin_unlock(server-cb_lock);
}
-   spin_unlock(server-cb_lock);
-   }
 
-   spin_lock(vnode-server-fs_lock);
-   rb_erase(vnode-server_rb, vnode-server-fs_vnodes);
-   spin_unlock(vnode-server-fs_lock);
+   spin_lock(server-fs_lock);
+   rb_erase(vnode-server_rb, server-fs_vnodes);
+   spin_unlock(server-fs_lock);
 
-   vnode-server = NULL;
-   afs_put_server(server);
+   vnode-server = NULL;
+   afs_put_server(server);
+   } else {
+   ASSERT(!vnode-cb_promised);
+   }
+
+   _leave();
 }
 
 /*
@@ -225,7 +234,7 @@ void afs_vnode_finalise_status_update(struct afs_vnode 
*vnode,
  */
 static void afs_vnode_status_update_failed(struct afs_vnode *vnode, int ret)
 {
-   _enter(%p,%d, vnode, ret);
+   _enter({%x:%u},%d, vnode-fid.vid, vnode-fid.vnode, ret);
 
spin_lock(vnode-lock);
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] file capabilities: accomodate 32 bit capabilities

2007-05-10 Thread Andreas Dilger
On May 08, 2007  16:49 -0500, Serge E. Hallyn wrote:
 Quoting Andreas Dilger ([EMAIL PROTECTED]):
  One of the important use cases I can see today is the ability to
  split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
  grained attributes.
 
 Sounds plausible, though it suffers from both making capabilities far
 more cumbersome (i.e. finding the right capability for what you wanted
 to do) and backward compatibility.  Perhaps at that point we should
 introduce security.capabilityv2 xattrs.  A binary can then carry
 security.capability=CAP_SYS_ADMIN=p, and
 security.capabilityv2=cap_may_clone_mntns=p.

Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing
12 bytes worth of data, so it is probably just better to keep extending
the original capability fields as was in the proposal.

  What we definitely do NOT want to happen is an application that needs
  priviledged access (e.g. e2fsck, mount) to stop running because the
  new capabilities _would_ have been granted by the new kernel and are
  not by the old kernel and STRICTXATTR is used.
  
  To me it would seem that having extra capabilities on an old kernel
  is relatively harmless if the old kernel doesn't know what they are.
  It's like having a key to a door that you don't know where it is.
 
 If we ditch the STRICTXATTR option do the semantics seem sane to you?

Seems reasonable.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html