Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-17 Thread Amit K. Arora
On Thu, May 17, 2007 at 09:40:36AM +1000, David Chinner wrote:
> On Wed, May 16, 2007 at 07:21:16AM -0500, Dave Kleikamp wrote:
> > On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:
> > > On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:
> > 
> > > > Following changes were made to the previous version:
> > > >  1) Added description before sys_fallocate() definition.
> > > >  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> > > > posix_fallocate should return EINVAL for len <= 0.
> > > >  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
> > > >  4) Do not return ENODEV for dirs (let individual file systems decide if
> > > > they want to support preallocation to directories or not.
> > > >  5) Check for wrap through zero.
> > > >  6) Update c/mtime if fallocate() succeeds.
> > > 
> > > Please don't make this always happen. c/mtime updates should be dependent
> > > on the mode being used and whether there is visible change to the file. 
> > > If no
> > > userspace visible changes to the file occurred, then timestamps should not
> > > be changed.
> > 
> > i_blocks will be updated, so it seems reasonable to update ctime.  mtime
> > shouldn't be changed, though, since the contents of the file will be
> > unchanged.
> 
> That's assuming blocks were actually allocated - if the prealloc range already
> has underlying blocks there is no change and so we should not be changing
> mtime either. Only the filesystem will know if it has changed the file, so I
> think that timestamp updates need to be driven down to that level, not done
> blindy at the highest layer

Ok. Will make this change in the next post.

--
Regards,
Amit Arora
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-17 Thread Dave Kleikamp
On Thu, 2007-05-17 at 09:40 +1000, David Chinner wrote:
> On Wed, May 16, 2007 at 07:21:16AM -0500, Dave Kleikamp wrote:
> > On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:

> > > Please don't make this always happen. c/mtime updates should be dependent
> > > on the mode being used and whether there is visible change to the file. 
> > > If no
> > > userspace visible changes to the file occurred, then timestamps should not
> > > be changed.
> > 
> > i_blocks will be updated, so it seems reasonable to update ctime.  mtime
> > shouldn't be changed, though, since the contents of the file will be
> > unchanged.
> 
> That's assuming blocks were actually allocated - if the prealloc range already
> has underlying blocks there is no change and so we should not be changing
> mtime either. Only the filesystem will know if it has changed the file, so I
> think that timestamp updates need to be driven down to that level, not done
> blindy at the highest layer

Yes, I agree.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 07:21:16AM -0500, Dave Kleikamp wrote:
> On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:
> > On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:
> 
> > > Following changes were made to the previous version:
> > >  1) Added description before sys_fallocate() definition.
> > >  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> > > posix_fallocate should return EINVAL for len <= 0.
> > >  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
> > >  4) Do not return ENODEV for dirs (let individual file systems decide if
> > > they want to support preallocation to directories or not.
> > >  5) Check for wrap through zero.
> > >  6) Update c/mtime if fallocate() succeeds.
> > 
> > Please don't make this always happen. c/mtime updates should be dependent
> > on the mode being used and whether there is visible change to the file. If 
> > no
> > userspace visible changes to the file occurred, then timestamps should not
> > be changed.
> 
> i_blocks will be updated, so it seems reasonable to update ctime.  mtime
> shouldn't be changed, though, since the contents of the file will be
> unchanged.

That's assuming blocks were actually allocated - if the prealloc range already
has underlying blocks there is no change and so we should not be changing
mtime either. Only the filesystem will know if it has changed the file, so I
think that timestamp updates need to be driven down to that level, not done
blindy at the highest layer

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-16 Thread Amit K. Arora
On Wed, May 16, 2007 at 07:21:16AM -0500, Dave Kleikamp wrote:
> On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:
> > On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:
> 
> > > Following changes were made to the previous version:
> > >  1) Added description before sys_fallocate() definition.
> > >  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> > > posix_fallocate should return EINVAL for len <= 0.
> > >  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
> > >  4) Do not return ENODEV for dirs (let individual file systems decide if
> > > they want to support preallocation to directories or not.
> > >  5) Check for wrap through zero.
> > >  6) Update c/mtime if fallocate() succeeds.
> > 
> > Please don't make this always happen. c/mtime updates should be dependent
> > on the mode being used and whether there is visible change to the file. If 
> > no
> > userspace visible changes to the file occurred, then timestamps should not
> > be changed.
> 
> i_blocks will be updated, so it seems reasonable to update ctime.  mtime
> shouldn't be changed, though, since the contents of the file will be
> unchanged.

I agree. Thus the ctime should change for FA_PREALLOCATE mode also
(which does not change the file size) - if we end up having this
additional mode in near future.

--
Regards,
Amit Arora
 
> > e.g. FA_ALLOCATE that changes file size requires same semantics of 
> > ftruncate()
> > extending the file, otherwise no change in timestamps should occur.
> > 
> > Cheers,
> > 
> > Dave.
> -- 
> David Kleikamp
> IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-16 Thread Amit K. Arora
On Tue, May 15, 2007 at 05:42:46PM -0700, Mingming Cao wrote:
> On Wed, 2007-05-16 at 01:33 +0530, Amit K. Arora wrote:
> > This patch implements sys_fallocate() and adds support on i386, x86_64
> > and powerpc platforms.
> 
> > @@ -1137,6 +1148,8 @@ struct inode_operations {
> > ssize_t (*listxattr) (struct dentry *, char *, size_t);
> > int (*removexattr) (struct dentry *, const char *);
> > void (*truncate_range)(struct inode *, loff_t, loff_t);
> > +   long (*fallocate)(struct inode *inode, int mode, loff_t offset,
> > + loff_t len);
> >  };
> 
> Does the return value from fallocate inode operation has to be *long*?
> It's not consistent with the ext4_fallocate() define in patch 4/5, 

I think ->fallocate() should return a "long", since sys_fallocate() has
to return what ->fallocate() returns and hence their return type should
ideally match.
 
> +int ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t
> len)

I will change the ext4_fallocate() to return a "long" (in patch 4/5)
in the next post.

Agree ?

Thanks!
--
Regards,
Amit Arora

> 
> thus cause compile warnings.
> 
> 
> 
> Mingming
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-16 Thread Dave Kleikamp
On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:
> On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:

> > Following changes were made to the previous version:
> >  1) Added description before sys_fallocate() definition.
> >  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> > posix_fallocate should return EINVAL for len <= 0.
> >  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
> >  4) Do not return ENODEV for dirs (let individual file systems decide if
> > they want to support preallocation to directories or not.
> >  5) Check for wrap through zero.
> >  6) Update c/mtime if fallocate() succeeds.
> 
> Please don't make this always happen. c/mtime updates should be dependent
> on the mode being used and whether there is visible change to the file. If no
> userspace visible changes to the file occurred, then timestamps should not
> be changed.

i_blocks will be updated, so it seems reasonable to update ctime.  mtime
shouldn't be changed, though, since the contents of the file will be
unchanged.

> e.g. FA_ALLOCATE that changes file size requires same semantics of ftruncate()
> extending the file, otherwise no change in timestamps should occur.
> 
> Cheers,
> 
> Dave.
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-15 Thread David Chinner
On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:
> This patch implements sys_fallocate() and adds support on i386, x86_64
> and powerpc platforms.

Can you please pick up the ia64 support patch I posted as well?

> Changelog:
> -
> Note: The changes below are from the initial post (dated 26th April,
> 2007) and _not_ from TAKE2. The only difference from TAKE2 is the kernel
> version on which this patch is based. TAKE2 was based on 2.6.21 and this
> is based on 2.6.22-rc1.
> 
> Following changes were made to the previous version:
>  1) Added description before sys_fallocate() definition.
>  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> posix_fallocate should return EINVAL for len <= 0.
>  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
>  4) Do not return ENODEV for dirs (let individual file systems decide if
> they want to support preallocation to directories or not.
>  5) Check for wrap through zero.
>  6) Update c/mtime if fallocate() succeeds.

Please don't make this always happen. c/mtime updates should be dependent
on the mode being used and whether there is visible change to the file. If no
userspace visible changes to the file occurred, then timestamps should not
be changed.

e.g. FA_ALLOCATE that changes file size requires same semantics of ftruncate()
extending the file, otherwise no change in timestamps should occur.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-15 Thread Mingming Cao
On Wed, 2007-05-16 at 01:33 +0530, Amit K. Arora wrote:
> This patch implements sys_fallocate() and adds support on i386, x86_64
> and powerpc platforms.

> @@ -1137,6 +1148,8 @@ struct inode_operations {
>   ssize_t (*listxattr) (struct dentry *, char *, size_t);
>   int (*removexattr) (struct dentry *, const char *);
>   void (*truncate_range)(struct inode *, loff_t, loff_t);
> + long (*fallocate)(struct inode *inode, int mode, loff_t offset,
> +   loff_t len);
>  };

Does the return value from fallocate inode operation has to be *long*?
It's not consistent with the ext4_fallocate() define in patch 4/5, 

+int ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t
len)

thus cause compile warnings.



Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-15 Thread Amit K. Arora
This patch implements sys_fallocate() and adds support on i386, x86_64
and powerpc platforms.

Changelog:
-
Note: The changes below are from the initial post (dated 26th April,
2007) and _not_ from TAKE2. The only difference from TAKE2 is the kernel
version on which this patch is based. TAKE2 was based on 2.6.21 and this
is based on 2.6.22-rc1.

Following changes were made to the previous version:
 1) Added description before sys_fallocate() definition.
 2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
posix_fallocate should return EINVAL for len <= 0.
 3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
 4) Do not return ENODEV for dirs (let individual file systems decide if
they want to support preallocation to directories or not.
 5) Check for wrap through zero.
 6) Update c/mtime if fallocate() succeeds.
 7) Added mode descriptions in fs.h
 8) Added variable names to function definition (fallocate inode op)

Here is the new patch:

Signed-off-by: Amit Arora <[EMAIL PROTECTED]>
---
 arch/i386/kernel/syscall_table.S |1 
 arch/powerpc/kernel/sys_ppc32.c  |7 +++
 arch/x86_64/ia32/ia32entry.S |1 
 fs/open.c|   89 +++
 include/asm-i386/unistd.h|3 -
 include/asm-powerpc/systbl.h |1 
 include/asm-powerpc/unistd.h |3 -
 include/asm-x86_64/unistd.h  |2 
 include/linux/fs.h   |   13 +
 include/linux/syscalls.h |1 
 10 files changed, 119 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.22-rc1.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6.22-rc1/arch/i386/kernel/syscall_table.S
@@ -323,3 +323,4 @@ ENTRY(sys_call_table)
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_fallocate
Index: linux-2.6.22-rc1/arch/powerpc/kernel/sys_ppc32.c
===
--- linux-2.6.22-rc1.orig/arch/powerpc/kernel/sys_ppc32.c
+++ linux-2.6.22-rc1/arch/powerpc/kernel/sys_ppc32.c
@@ -773,6 +773,13 @@ asmlinkage int compat_sys_truncate64(con
return sys_truncate(path, (high << 32) | low);
 }
 
+asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
+u32 lenhi, u32 lenlo)
+{
+   return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo,
+((loff_t)lenhi << 32) | lenlo);
+}
+
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
Index: linux-2.6.22-rc1/fs/open.c
===
--- linux-2.6.22-rc1.orig/fs/open.c
+++ linux-2.6.22-rc1/fs/open.c
@@ -353,6 +353,95 @@ asmlinkage long sys_ftruncate64(unsigned
 #endif
 
 /*
+ * sys_fallocate - preallocate blocks or free preallocated blocks
+ * @fd: the file descriptor
+ * @mode: mode specifies if fallocate should preallocate blocks OR free
+ *   (unallocate) preallocated blocks. Currently only FA_ALLOCATE and
+ *   FA_DEALLOCATE modes are supported.
+ * @offset: The offset within file, from where (un)allocation is being
+ * requested. It should not have a negative value.
+ * @len: The amount (in bytes) of space to be (un)allocated, from the offset.
+ *
+ * This system call, depending on the mode, preallocates or unallocates blocks
+ * for a file. The range of blocks depends on the value of offset and len
+ * arguments provided by the user/application. For FA_ALLOCATE mode, if this
+ * system call succeeds, subsequent writes to the file in the given range
+ * (specified by offset & len) should not fail - even if the file system
+ * later becomes full. Hence the preallocation done is persistent (valid
+ * even after reopen of the file and remount/reboot).
+ *
+ * Note: Incase the file system does not support preallocation,
+ * posix_fallocate() should fall back to the library implementation (i.e.
+ * allocating zero-filled new blocks to the file).
+ *
+ * Return Values
+ * 0   : On SUCCESS a value of zero is returned.
+ * error   : On Failure, an error code will be returned.
+ * An error code of -ENOSYS or -EOPNOTSUPP should make posix_fallocate()
+ * fall back on library implementation of fallocate.
+ *
+ *  Generic fallocate to be added for file systems that do not
+ *  support fallocate it.
+ */
+asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
+{
+   struct file *file;
+   struct inode *inode;
+   long ret = -EINVAL;
+
+   if (offset < 0 || len <= 0)
+   goto out;
+
+   /* Return error if mode is not supported */
+   ret = -EOPNOTSUPP;
+   if (mode != FA_ALLOCATE && mode !=FA_DEALLOCATE)
+   goto out;
+
+   ret = -EBADF;
+   file =