Re: [PATCH 0/5] fallocate system call

2007-05-07 Thread David Chinner
On Thu, May 03, 2007 at 01:22:48PM +0200, Miquel van Smoorenburg wrote:
> In article <[EMAIL PROTECTED]> you write:
> >On May 02, 2007  18:23 +0530, Amit K. Arora wrote:
> >> On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:
> >> > On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:
> >> > 
> >> > > For FA_ALLOCATE, it's supposed to change the file size if we
> >> > > allocate past EOF, right?
> >> > 
> >> > I would argue no.  Use truncate for that.
> >> 
> >> The patch I posted for ext4 *does* change the filesize after
> >> preallocation, if required (i.e. when preallocation is after EOF).
> >> I may have to change that, if we decide on not doing this.
> >
> >I think I'd agree - it may be useful to allow preallocation beyond EOF
> >for some kinds of applications (e.g. PVR preallocating live TV in 10
> >minute segments or something, but not knowing in advance how long the
> >show will actually be recorded or the final encoded size).
> 
> I have an application (diablo dreader) where the header-info database
> basically consists of ~40.000 files, one for each group (it's more
> complicated that that, but never mind that now).
> 
> If you grow those files randomly by a few hundred bytes every update,
> the filesystem gets hopelessly fragmented.
> 
> I'm using XFS with preallocation turned on, and biosize=18 (which
> makes it preallocate in blocks of 256KB), and a homebrew patch that
> leaves the preallocated space on disk preallocated even if the
> file is closed .. and it helps enormously.

XFS always has speculative preallocation turned on - this is
different to explicit preallocation which we are talking about
here ;)

FWIW, the reason you need your homebrew patch is that specualtive
allocation does not set the PREALLOC bit on the inode, and so when
you close the file the speculative prealloc gets truncated away.
If you use a real preallocation (XFS_IOC_RESVSP64) or the upcoming
fallocate() syscall, XFS also sets the PREALLOC bit in the inode so
it doesn't get truncated away on file close.

If you don't want to use XFS_IOC_RESVSP64, you could just use
XFS_IOC_FSSETXATTR to set the prealloc bit on the files you care
about so you don't need a hack in XFS to prevent truncation of
speculative allocation on file close.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-05-03 Thread Miquel van Smoorenburg
In article <[EMAIL PROTECTED]> you write:
>On May 02, 2007  18:23 +0530, Amit K. Arora wrote:
>> On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:
>> > On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:
>> > 
>> > > For FA_ALLOCATE, it's supposed to change the file size if we
>> > > allocate past EOF, right?
>> > 
>> > I would argue no.  Use truncate for that.
>> 
>> The patch I posted for ext4 *does* change the filesize after
>> preallocation, if required (i.e. when preallocation is after EOF).
>> I may have to change that, if we decide on not doing this.
>
>I think I'd agree - it may be useful to allow preallocation beyond EOF
>for some kinds of applications (e.g. PVR preallocating live TV in 10
>minute segments or something, but not knowing in advance how long the
>show will actually be recorded or the final encoded size).

I have an application (diablo dreader) where the header-info database
basically consists of ~40.000 files, one for each group (it's more
complicated that that, but never mind that now).

If you grow those files randomly by a few hundred bytes every update,
the filesystem gets hopelessly fragmented.

I'm using XFS with preallocation turned on, and biosize=18 (which
makes it preallocate in blocks of 256KB), and a homebrew patch that
leaves the preallocated space on disk preallocated even if the
file is closed .. and it helps enormously.

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-05-03 Thread Andreas Dilger
On May 02, 2007  18:23 +0530, Amit K. Arora wrote:
> On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:
> > On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:
> > 
> > > For FA_ALLOCATE, it's supposed to change the file size if we
> > > allocate past EOF, right?
> > 
> > I would argue no.  Use truncate for that.
> 
> The patch I posted for ext4 *does* change the filesize after
> preallocation, if required (i.e. when preallocation is after EOF).
> I may have to change that, if we decide on not doing this.

I think I'd agree - it may be useful to allow preallocation beyond EOF
for some kinds of applications (e.g. PVR preallocating live TV in 10
minute segments or something, but not knowing in advance how long the
show will actually be recorded or the final encoded size).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-05-02 Thread Amit K. Arora
On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:
> On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:
> 
> > For FA_ALLOCATE, it's supposed to change the file size if we
> > allocate past EOF, right?
> 
> I would argue no.  Use truncate for that.

The patch I posted for ext4 *does* change the filesize after
preallocation, if required (i.e. when preallocation is after EOF).
I may have to change that, if we decide on not doing this.

--
Regards,
Amit Arora 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-29 Thread Chris Wedgwood
On Mon, Apr 30, 2007 at 03:56:32PM +1000, David Chinner wrote:
> On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:

> IIRC, the argument for FA_ALLOCATE changing file size is that
> posix_fallocate() is supposed to change the file size.

But it's not posix_fallocate; it's something more generic. glibc can
do posix_fallocate using truncate + fallocate.

> Note that the way XFS implements growing the file size after the
> allocation is via a truncate

What's wrong with that?  That seems very reasonable.

> That's would what I did because otherwise you'd use ftruncate64().
> Without documented behaviour or an ext4 implementation, I have to
> ask what it's supposed to do, though ;)

How many *real* users are there for ext4?  Why does 'what ext4 does'
define 'the semantics'?

Surely semantics should be decided either by precedent (if there is an
existing relevant userbase) or sensible thought and some debate?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-29 Thread David Chinner
On Sun, Apr 29, 2007 at 10:25:59PM -0700, Chris Wedgwood wrote:
> On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:
> 
> > For FA_ALLOCATE, it's supposed to change the file size if we
> > allocate past EOF, right?
> 
> I would argue no.  Use truncate for that.

I'm going from the ext4 implementation because the semantics
have not been documented yet.

IIRC, the argument for FA_ALLOCATE changing file size is that
posix_fallocate() is supposed to change the file size. I think
that having a mode for real preallocation and another for
posix_fallocate is a valid thing to do...

Note that the way XFS implements growing the file size after the
allocation is via a truncate

> > For FA_DEALLOCATE, does it change the filesize at all?
> 
> Same as above.
> 
> > Or does
> > it just punch a hole in the file?
> 
> Yes.

That's would what I did because otherwise you'd use ftruncate64().
Without documented behaviour or an ext4 implementation, I have to
ask what it's supposed to do, though ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-29 Thread Chris Wedgwood
On Mon, Apr 30, 2007 at 10:47:02AM +1000, David Chinner wrote:

> For FA_ALLOCATE, it's supposed to change the file size if we
> allocate past EOF, right?

I would argue no.  Use truncate for that.

> For FA_DEALLOCATE, does it change the filesize at all?

Same as above.

> Or does
> it just punch a hole in the file?

Yes.

> FWIW, we definitely need a FA_PREALLOCATE mode (FA_ALLOCATE but does
> not change file size) so we can preallocate beyond EOF for apps
> which use O_APPEND (i.e. changing file size would cause problems for
> them).

FA_ALLOCATE should be able to allocate past-EOF I would argue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-29 Thread David Chinner
On Thu, Apr 26, 2007 at 11:20:56PM +0530, Amit K. Arora wrote:
> Based on the discussion, this new patchset uses following as the
> interface for fallocate() system call:
> 
>  asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)

Ok, so now for the hard questions - what are the semantics of
FA_ALLOCATE and FA_DEALLOCATE?

For FA_ALLOCATE, it's supposed to change the file size if we
allocate past EOF, right? What's the return value supposed to
be? Zero for success, error otherwise? Does this update a/m/ctime
at all? How persistent is this preallocation? Should it be
there "forever" or for the lifetime of the currently open fd
that it was preallocated on?

For FA_DEALLOCATE, does it change the filesize at all? Or does
it just punch a hole in the file? If it does change file size,
what happens when you punch out preallocation beyond EOF?
What's the return value supposed to be?

> Currently we have two modes FA_ALLOCATE and FA_DEALLOCATE, for
> preallocation and deallocation of preallocated blocks respectively. More
> modes can be added, when required.

FWIW, we definitely need a FA_PREALLOCATE mode (FA_ALLOCATE but does
not change file size) so we can preallocate beyond EOF for apps which
use O_APPEND (i.e. changing file size would cause problems for them).

> ToDos:
> =
> 1>   Implementation on other architectures (other than i386, x86_64, 
> ppc64 and s390(x)) 

I'll have ia64 soon.

> 2>   A generic file system operation to handle fallocate
> (generic_fallocate), for filesystems that do _not_ have the fallocate
> inode operation implemented.
> 3>   Changes to glibc,
>   a) to support fallocate() system call
>   b) so that posix_fallocate() and posix_fallocate64() call
>  fallocate() system call
> 4>   Changes to XFS to implement the fallocate inode operation

And that's what I'm doing now, hence all the questions ;)

BTW, do you have a test program for this, or will I need to write
one myself?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-27 Thread Chris Wedgwood
On Fri, Apr 27, 2007 at 07:46:13PM +0200, Heiko Carstens wrote:

> If one insists to have fd at first argument, what is wrong with
> having u32 arguments only?

Well, I was one of those who objected as it seems *UGLY* to me.

> It's not that this syscall comes even close to what can be
> considered performance critical...

Right.

> It adds userspace overhead for one architecture. Every *trace and
> *libc needs special handling on s390 for this syscall. I would
> prefer to avoid this.

I'm not that bothered about it.  I would prefer it did use clean
64-bit arguments, but given it's a non-critical syscall I'm don't
think the aesthetics are worth impossing crud on s390 for.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-27 Thread Heiko Carstens
On Fri, Apr 27, 2007 at 04:43:28PM +0200, Jörn Engel wrote:
> On Fri, 27 April 2007 14:10:03 +0200, Heiko Carstens wrote:
> > 
> > After long discussions where at least two possible implementations
> > were suggested that would work on _all_ architectures you chose one
> > which doesn't and causes extra effort.
> 
> I believe the long discussion also showed that every possible
> implementation has drawbacks.  To me this one appeared to be the best of
> many bad choices.

If one insists to have fd at first argument, what is wrong with having
u32 arguments only? It's not that this syscall comes even close to
what can be considered performance critical...

> Is this implementation worse than we thought?

It adds userspace overhead for one architecture. Every *trace and
*libc needs special handling on s390 for this syscall. I would
prefer to avoid this.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-27 Thread Jörn Engel
On Fri, 27 April 2007 14:10:03 +0200, Heiko Carstens wrote:
> 
> After long discussions where at least two possible implementations
> were suggested that would work on _all_ architectures you chose one
> which doesn't and causes extra effort.

I believe the long discussion also showed that every possible
implementation has drawbacks.  To me this one appeared to be the best of
many bad choices.

Is this implementation worse than we thought?

Jörn

-- 
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] fallocate system call

2007-04-27 Thread Heiko Carstens
On Thu, Apr 26, 2007 at 11:20:56PM +0530, Amit K. Arora wrote:
> Based on the discussion, this new patchset uses following as the
> interface for fallocate() system call:
> 
>  asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
> 
> It seems that only s390 architecture has a problem with such a layout of
> arguments in fallocate(). Thus for s390, we plan to have a wrapper
> (say, sys_s390_fallocate()) for the sys_fallocate(), which will get
> called by glibc when an application issues a fallocate() system call
> on s390. The s390 arch specific changes will be part of a separate
> patch (PATCH 2/5). It will be great if some s390 expert can verify the
> patch, since I have not been able to test it on s390 so far.

After long discussions where at least two possible implementations
were suggested that would work on _all_ architectures you chose one
which doesn't and causes extra effort.

> It was also noted that minor changes might be required to strace code
> to take care of "different arguments on s390" issue.

This is not limited to strace...

Besides that the s390 backend looks ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] fallocate system call

2007-04-26 Thread Amit K. Arora
Based on the discussion, this new patchset uses following as the
interface for fallocate() system call:

 asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)

It seems that only s390 architecture has a problem with such a layout of
arguments in fallocate(). Thus for s390, we plan to have a wrapper
(say, sys_s390_fallocate()) for the sys_fallocate(), which will get
called by glibc when an application issues a fallocate() system call
on s390. The s390 arch specific changes will be part of a separate
patch (PATCH 2/5). It will be great if some s390 expert can verify the
patch, since I have not been able to test it on s390 so far.

It was also noted that minor changes might be required to strace code
to take care of "different arguments on s390" issue.

Currently we have two modes FA_ALLOCATE and FA_DEALLOCATE, for
preallocation and deallocation of preallocated blocks respectively. More
modes can be added, when required.

ToDos:
=
1>   Implementation on other architectures (other than i386, x86_64, 
ppc64 and s390(x)) 
2>   A generic file system operation to handle fallocate
(generic_fallocate), for filesystems that do _not_ have the fallocate
inode operation implemented.
3>   Changes to glibc,
a) to support fallocate() system call
b) so that posix_fallocate() and posix_fallocate64() call
   fallocate() system call
4>   Changes to XFS to implement the fallocate inode operation


Following patches follow:

Patch 1/5 : fallocate() implementation in i86, x86_64 and powerpc
Patch 2/5 : fallocate() on s390
Patch 3/5 : ext4: Extent overlap bugfix
Patch 4/5 : ext4: fallocate support in ext4
Patch 5/5 : ext4: write support for preallocated blocks

--
Regards,
Amit Arora

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/