Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Christoph Hellwig
On Fri, Jul 13, 2007 at 07:48:58PM +0530, Amit K. Arora wrote:
> Ok. Since we have only one flag (FALLOC_FL_KEEP_SIZE) and we do not want
> to declare the default mode (FALLOC_ALLOCATE), we can _just_ have this
> flag and remove the other mode too (FALLOC_RESV_SPACE).
> Is this what you are suggesting ?

Yes.

> Should we need a header file just to declare one flag - i.e.
> FALLOC_FL_KEEP_SIZE (since now there is no point of declaring the two
> modes) ? If "linux/fs.h" is not a good place, will "asm-generic/fcntl.h"
> be a sane place for this flag ?

It might sound a litte silly but is the cleanest thing we could do by
far.  And I suspect there will be more more flags soon..

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Amit K. Arora
On Fri, Jul 13, 2007 at 02:21:19PM +0100, Christoph Hellwig wrote:
> On Fri, Jul 13, 2007 at 06:17:55PM +0530, Amit K. Arora wrote:
> >  /*
> > + * sys_fallocate - preallocate blocks or free preallocated blocks
> > + * @fd: the file descriptor
> > + * @mode: mode specifies the behavior of allocation.
> > + * @offset: The offset within file, from where allocation is being
> > + * requested. It should not have a negative value.
> > + * @len: The amount of space in bytes to be allocated, from the offset.
> > + *  This can not be zero or a negative value.
> 
> kerneldoc comments are for in-kernel APIs which syscalls aren't.  I'd say
> just temove this comment, the manpage is a much better documentation anyway.

Ok. I will remove this entire comment.
 
> > + *  Generic fallocate to be added for file systems that do not
> > + *  support fallocate.
> 
> Please remove the comment, adding a generic fallback in kernelspace is a
> very dumb idea as we already discussed long time ago.
>
> > --- linux-2.6.22.orig/include/linux/fs.h
> > +++ linux-2.6.22/include/linux/fs.h
> > @@ -266,6 +266,21 @@ extern int dir_notify_enable;
> >  #define SYNC_FILE_RANGE_WRITE  2
> >  #define SYNC_FILE_RANGE_WAIT_AFTER 4
> >  
> > +/*
> > + * sys_fallocate modes
> > + * Currently sys_fallocate supports two modes:
> > + * FALLOC_ALLOCATE :   This is the preallocate mode, using which an 
> > application
> > + * may request reservation of space for a particular file.
> > + * The file size will be changed if the allocation is
> > + * beyond EOF.
> > + * FALLOC_RESV_SPACE : This is same as the above mode, with only one 
> > difference
> > + * that the file size will not be modified.
> > + */
> > +#define FALLOC_FL_KEEP_SIZE0x01 /* default is extend/shrink size */
> > +
> > +#define FALLOC_ALLOCATE0
> > +#define FALLOC_RESV_SPACE  FALLOC_FL_KEEP_SIZE
> 
> Just remove FALLOC_ALLOCATE, 0 flags should be the default.  I'm also
> not sure there is any point in having two namespace now that we have a flags-
> based ABI.

Ok. Since we have only one flag (FALLOC_FL_KEEP_SIZE) and we do not want
to declare the default mode (FALLOC_ALLOCATE), we can _just_ have this
flag and remove the other mode too (FALLOC_RESV_SPACE).
Is this what you are suggesting ?

> Also please don't add this to fs.h.  fs.h is a complete mess and the
> falloc flags are a new user ABI.  Add a linux/falloc.h instead which can
> be added to headers-y so the ABI constant can be exported to userspace.

Should we need a header file just to declare one flag - i.e.
FALLOC_FL_KEEP_SIZE (since now there is no point of declaring the two
modes) ? If "linux/fs.h" is not a good place, will "asm-generic/fcntl.h"
be a sane place for this flag ?

Thanks!
--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Christoph Hellwig
On Fri, Jul 13, 2007 at 06:17:55PM +0530, Amit K. Arora wrote:
>  /*
> + * sys_fallocate - preallocate blocks or free preallocated blocks
> + * @fd: the file descriptor
> + * @mode: mode specifies the behavior of allocation.
> + * @offset: The offset within file, from where allocation is being
> + *   requested. It should not have a negative value.
> + * @len: The amount of space in bytes to be allocated, from the offset.
> + *This can not be zero or a negative value.

kerneldoc comments are for in-kernel APIs which syscalls aren't.  I'd say
just temove this comment, the manpage is a much better documentation anyway.

> + *  Generic fallocate to be added for file systems that do not
> + *support fallocate.

Please remove the comment, adding a generic fallback in kernelspace is a
very dumb idea as we already discussed long time ago.

> --- linux-2.6.22.orig/include/linux/fs.h
> +++ linux-2.6.22/include/linux/fs.h
> @@ -266,6 +266,21 @@ extern int dir_notify_enable;
>  #define SYNC_FILE_RANGE_WRITE2
>  #define SYNC_FILE_RANGE_WAIT_AFTER   4
>  
> +/*
> + * sys_fallocate modes
> + * Currently sys_fallocate supports two modes:
> + * FALLOC_ALLOCATE : This is the preallocate mode, using which an application
> + *   may request reservation of space for a particular file.
> + *   The file size will be changed if the allocation is
> + *   beyond EOF.
> + * FALLOC_RESV_SPACE :   This is same as the above mode, with only one 
> difference
> + *   that the file size will not be modified.
> + */
> +#define FALLOC_FL_KEEP_SIZE0x01 /* default is extend/shrink size */
> +
> +#define FALLOC_ALLOCATE0
> +#define FALLOC_RESV_SPACE  FALLOC_FL_KEEP_SIZE

Just remove FALLOC_ALLOCATE, 0 flags should be the default.  I'm also
not sure there is any point in having two namespace now that we have a flags-
based ABI.

Also please don't add this to fs.h.  fs.h is a complete mess and the
falloc flags are a new user ABI.  Add a linux/falloc.h instead which can
be added to headers-y so the ABI constant can be exported to userspace.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6][TAKE7] fallocate() implementation in i386, x86_64 and powerpc

2007-07-13 Thread Amit K. Arora
From: Amit Arora <[EMAIL PROTECTED]>

sys_fallocate() implementation on i386, x86_64 and powerpc

fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. A generic file system operation to handle fallocate
   (generic_fallocate), for filesystems that do _not_ have the fallocate
   inode operation implemented.
3. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()


Signed-off-by: Amit Arora <[EMAIL PROTECTED]>

Index: linux-2.6.22/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.22.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6.22/arch/i386/kernel/syscall_table.S
@@ -323,3 +323,4 @@ ENTRY(sys_call_table)
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_fallocate
Index: linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
===
--- linux-2.6.22.orig/arch/powerpc/kernel/sys_ppc32.c
+++ linux-2.6.22/arch/powerpc/kernel/sys_ppc32.c
@@ -773,6 +773,13 @@ asmlinkage int compat_sys_truncate64(con
return sys_truncate(path, (high << 32) | low);
 }
 
+asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
+u32 lenhi, u32 lenlo)
+{
+   return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo,
+((loff_t)lenhi << 32) | lenlo);
+}
+
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
Index: linux-2.6.22/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.22.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6.22/arch/x86_64/ia32/ia32entry.S
@@ -719,4 +719,5 @@ ia32_sys_call_table:
.quad compat_sys_signalfd
.quad compat_sys_timerfd
.quad sys_eventfd
+   .quad sys32_fallocate
 ia32_syscall_end:
Index: linux-2.6.22/fs/open.c
===
--- linux-2.6.22.orig/fs/open.c
+++ linux-2.6.22/fs/open.c
@@ -353,6 +353,92 @@ asmlinkage long sys_ftruncate64(unsigned
 #endif
 
 /*
+ * sys_fallocate - preallocate blocks or free preallocated blocks
+ * @fd: the file descriptor
+ * @mode: mode specifies the behavior of allocation.
+ * @offset: The offset within file, from where allocation is being
+ * requested. It should not have a negative value.
+ * @len: The amount of space in bytes to be allocated, from the offset.
+ *  This can not be zero or a negative value.
+ *
+ * This system call preallocates space for a file. The range of blocks
+ * allocated depends on the value of offset and len arguments provided
+ * by the user/application. With FALLOC_ALLOCATE or FALLOC_RESV_SPACE
+ * modes, if the system call succeeds, subsequent writes to the file in
+ * the given range (specified by offset & len) should not fail - even if
+ * the file system later becomes full. Hence the preallocation done is
+ * persistent (valid even after reopen of the file and remount/reboot).
+ *
+ * It is expected that the ->fallocate() inode operation implemented by
+ * the individual file systems will update the file size and/or
+ * ctime/mtime depending on the mode and also on the success of the
+ * operation.
+ *
+ * Note: Incase the file system does not support preallocation,
+ * posix_fallocate() should fall back to the library implementation (i.e.
+ * allocating zero-filled new blocks to the file).
+ *
+ * Return Values
+ * 0