Re: [RFC] basic delayed allocation in VFS

2007-07-30 Thread Andrew Morton
On Mon, 30 Jul 2007 10:49:14 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-07-29 at 20:24 +0100, Christoph Hellwig wrote:
> > On Sun, Jul 29, 2007 at 11:30:36AM -0600, Andreas Dilger wrote:
> > > Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
> > > was rejected because "that functionality should go into the VFS".
> > > Since the performance improvement of delalloc is quite large, we'd
> > > like to get this into the kernel one way or another.  Can we make a
> > > decision if the ext4-specific delalloc is acceptable?
> > 
> > I'm a big proponent of having proper common delalloc code, but the
> > one proposed here is not generic for the existing filesystem using
> > delalloc.  
> 
> To be fair, what Alex have so far is probably good enough for ext2/3
> delayed allocation.
> 
> > It's still on my todo list to revamp the xfs code to get
> > rid of some of the existing mess and make it useable genericly.  If
> > the ext4 users are fine with the end result we could move to generic
> > code.
> > 
> 
> Are you okay with having a ext4 delayed allocation implementation (i.e.
> moving the code proposed in this thread to fs/ext4) first?  Then later
> when you come up with a generic delayed allocation for both ext4 and xfs
> we could make use of that generic implementation. Is that a acceptable
> approach? 
> 
> Andrew, what do you think?
> 

There's a decent risk that the generic implementation would never happen. 

I'd have thought that it'd be pretty tricky to make anything which is in
XFS suitable for general use, because after years of tuning and tweaking
it'll be full of xfs-specific things, but I haven't looked.

And a similar thing will happen if an ext4-specific version is merged.

The sad fact is that if we have a generic version, it turns out being a
least-common-denominator thing which never fully meets the requirements of
any of its users.  We end up filling the generic code up with
caller-selectable optional functionality for each filesystem.  (See
fs/direct-io.c).

The whole approach of making the pagecache/data handling be part of the VFS
hasn't been a great success, IMO.  It was fine for ext2 and similar (jfs,
minix, etc).  But for filesytems which do fancier things with data it
hasn't worked out well.  otoh, moving it all into the fs would have been a
bad decision too, so we just muddle through, making compromises.

So, umm, yes, on balance I do agree that we should explore doing some of
this in the VFS, and I believe that we should do it on the initial merge
rather than promising to ourselves that we'll fix it up later.  This will
devolve into the ext4 and xfs people working out which bits can and should
be moved into the VFS, and working out what they should look like.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-30 Thread Mingming Cao
On Sun, 2007-07-29 at 20:24 +0100, Christoph Hellwig wrote:
> On Sun, Jul 29, 2007 at 11:30:36AM -0600, Andreas Dilger wrote:
> > Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
> > was rejected because "that functionality should go into the VFS".
> > Since the performance improvement of delalloc is quite large, we'd
> > like to get this into the kernel one way or another.  Can we make a
> > decision if the ext4-specific delalloc is acceptable?
> 
> I'm a big proponent of having proper common delalloc code, but the
> one proposed here is not generic for the existing filesystem using
> delalloc.  

To be fair, what Alex have so far is probably good enough for ext2/3
delayed allocation.

> It's still on my todo list to revamp the xfs code to get
> rid of some of the existing mess and make it useable genericly.  If
> the ext4 users are fine with the end result we could move to generic
> code.
> 

Are you okay with having a ext4 delayed allocation implementation (i.e.
moving the code proposed in this thread to fs/ext4) first?  Then later
when you come up with a generic delayed allocation for both ext4 and xfs
we could make use of that generic implementation. Is that a acceptable
approach? 

Andrew, what do you think?


Regards,
Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread David Chinner
On Sun, Jul 29, 2007 at 04:09:20PM +0400, Alex Tomas wrote:
> David Chinner wrote:
> >On Fri, Jul 27, 2007 at 11:51:56AM +0400, Alex Tomas wrote:
> >But this is really irrelevant - the issue at hand is what we want
> >for VFS level delalloc support. IMO, that mechanism needs to support
> >both XFS and ext4, and I'd prefer if it doesn't perpetuate the
> >bufferhead abuses of the past (i.e. define an iomap structure
> >instead of overloading bufferheads yet again).
> 
> I'm not sure I understand very well.

->get_blocks abuses bufferheads to provide an offset/length/state
mapping. That's all it needs. That what the iomap structure is used
for. It's smaller than a bufferhead, it's descriptive of it's use
and you don't get it confused with the other 10 ways bufferheads
are used and abused.

> where would you track uptodate, dirty and other states then?
> do you propose to separate block states from block mapping?

No. They still get tracked in the bufferheads attached to the page.
That's what bufferheads were originally intended for(*).

Cheers,

Dave.

(*) I recently proposed a separate block map tree for this rather
than using buffer heads for this because of the memory footprint of
N bufferheads per page on contiguous mappings. That's future work,
not something we really need to consider here. Chris Mason's extent
map tree patches are a start on this concept.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Theodore Tso
On Sun, Jul 29, 2007 at 08:24:37PM +0100, Christoph Hellwig wrote:
> I'm a big proponent of having proper common delalloc code, but the
> one proposed here is not generic for the existing filesystem using
> delalloc.  It's still on my todo list to revamp the xfs code to get
> rid of some of the existing mess and make it useable genericly.  If
> the ext4 users are fine with the end result we could move to generic
> code.

Do you think it would be faster for you to revamp the code or to give
instructions about how you'd like to clean up the code and what has to
be preserved in order to keep XFS happy, so someone else could give it
a try?  Or do you think the code is to grotty and/or tricky for
someone else to attempt this?

> Note that moving to VFS is bullshit either way, writeback code is
> nowhere near the VFS nor should it.

Agreed.  I would think the something like mm/delayed_alloc.c would be
preferable.  Ideally it would be like the filemap.c code, where it
would be relatively easy for most standard filesystems to hook into it
and get the advantages of delayed allocation.  (Although granted it
will probably require more effort on the part of a filesystem author
than filemap!)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Alex Tomas

I'm a bit worried about one thing ... it looks like XFS and ext4
use different techniques to order data and metadata referencing
them. now I'm not that optimistic that we can separate ordering
from delalloc itself clean and reasonable way. In general, I'd
prefer common code in fs/ (mm/?) of course, for number of reasons.

thanks, Alex


Christoph Hellwig wrote:

I'm a big proponent of having proper common delalloc code, but the
one proposed here is not generic for the existing filesystem using
delalloc.  It's still on my todo list to revamp the xfs code to get
rid of some of the existing mess and make it useable genericly.  If
the ext4 users are fine with the end result we could move to generic
code.

Note that moving to VFS is bullshit either way, writeback code is
nowhere near the VFS nor should it.



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Christoph Hellwig
On Sun, Jul 29, 2007 at 11:30:36AM -0600, Andreas Dilger wrote:
> Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
> was rejected because "that functionality should go into the VFS".
> Since the performance improvement of delalloc is quite large, we'd
> like to get this into the kernel one way or another.  Can we make a
> decision if the ext4-specific delalloc is acceptable?

I'm a big proponent of having proper common delalloc code, but the
one proposed here is not generic for the existing filesystem using
delalloc.  It's still on my todo list to revamp the xfs code to get
rid of some of the existing mess and make it useable genericly.  If
the ext4 users are fine with the end result we could move to generic
code.

Note that moving to VFS is bullshit either way, writeback code is
nowhere near the VFS nor should it.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Christoph Hellwig
On Sun, Jul 29, 2007 at 09:48:10PM +0400, Alex Tomas wrote:
> I think the latter one is better because it supports bs < pagesize
> (though I'm not sure about data=ordered yet). I'm not against putting
> most of the patch into fs/ext4/, but at least few bits to be changed
> in fs/ - exports in  fs/mpage.c and one "if" in __block_write_full_page().

The changes to __block_write_full_page is obviously fine, and exporting
mpage.c bits sounds fine to me aswell, although I'd like to take a look
at the final patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Alex Tomas

Andreas Dilger wrote:

Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
was rejected because "that functionality should go into the VFS".
Since the performance improvement of delalloc is quite large, we'd
like to get this into the kernel one way or another.  Can we make a
decision if the ext4-specific delalloc is acceptable?


I think the latter one is better because it supports bs < pagesize
(though I'm not sure about data=ordered yet). I'm not against putting
most of the patch into fs/ext4/, but at least few bits to be changed
in fs/ - exports in  fs/mpage.c and one "if" in __block_write_full_page().

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Andreas Dilger
On Jul 28, 2007  20:51 +0100, Christoph Hellwig wrote:
> That doesn't mean I want to arge against Alex's code although I'd of
> course be more happy if we could actually shared code between multiple
> filesystems.
> 
> Of ourse the code in it's current form should not go into mpage.c but
> rather into ext4 so that it doesn't bloat the kernel for everyone.

Sigh, we HAVE a patch that was only adding delalloc to ext4, but it
was rejected because "that functionality should go into the VFS".
Since the performance improvement of delalloc is quite large, we'd
like to get this into the kernel one way or another.  Can we make a
decision if the ext4-specific delalloc is acceptable?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread Alex Tomas

David Chinner wrote:

On Fri, Jul 27, 2007 at 11:51:56AM +0400, Alex Tomas wrote:
But this is really irrelevant - the issue at hand is what we want
for VFS level delalloc support. IMO, that mechanism needs to support
both XFS and ext4, and I'd prefer if it doesn't perpetuate the
bufferhead abuses of the past (i.e. define an iomap structure
instead of overloading bufferheads yet again).


I'm not sure I understand very well. where would you track uptodate,
dirty and other states then? do you propose to separate block states
from block mapping?

thanks, Alex


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-29 Thread David Chinner
On Fri, Jul 27, 2007 at 11:51:56AM +0400, Alex Tomas wrote:
> David Chinner wrote:
> >Using a new API for new functionality is a bad thing?
> 
> if existing API can be used ...

Sure, but using the existing APIs is no good if the only filesystem
in the kernel that supports delalloc cannot use the new code

> >Also, looking at the way mpage_da_map_blocks() is done - if we have
> >an 128MB delalloc extent - ext4 will allocate that will allocate it
> >in one go, right? What happens if we then crash after only writing a
> >few megabytes of that extent? stale data exposure? XFS can allocate
> >multiple gigabytes in a single get_blocks call so even if ext4 can't
> >do this, it's a problem for XFS.
> 
> what happens if IO to 2nd MB is completed, while IO to 1st MB is not
> (probably sitting in queue) ? do you update on-disk size in this case?
> how do you track this?

We're updating the in-memory on-disk inode here, not the actual
inode on disk. That means that if we crashed right here, the file
size on disk would not be changed at all and the filesystem would
behave as if both writes did not ever occur and we simply end up
with empty "preallocated" blocks beyond EOF

But this is really irrelevant - the issue at hand is what we want
for VFS level delalloc support. IMO, that mechanism needs to support
both XFS and ext4, and I'd prefer if it doesn't perpetuate the
bufferhead abuses of the past (i.e. define an iomap structure
instead of overloading bufferheads yet again).

> >So without the ability to attach specific I/O completions to bios
> >or support for unwritten extents directly in __mpage_writepage,
> >there is no way XFS can use this "generic" delayed allocation code.
> 
> I didn't say "generic", see Subject: :)

No, you didn't, but VFS level functionality implies that
functionality is both generic and able to be used by all
filesystems.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-28 Thread Christoph Hellwig
On Fri, Jul 27, 2007 at 04:38:44PM +0400, Alex Tomas wrote:
> I just realized that you're talking about data=ordered mode in ext4,
> where care is taken to prevent on-disk references to no-yet-written
> blocks.

Any reference to non-written blocks is a bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-28 Thread Alex Tomas

Christoph Hellwig wrote:

This is not based on my attempt to make the xfs writeout path generic.
Alex's variant is a lot simpler and thus missed various bits required
for high sustained writeout performance or xfs functionality.


I'd very appreciate any details about high writeout performance.


That doesn't mean I want to arge against Alex's code although I'd of
course be more happy if we could actually shared code between multiple
filesystems.


I'm not against at all, of course. but xfs writeout code looks .. hmm ..
very xfs :)

thanks, Alex


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-28 Thread Christoph Hellwig
On Fri, Jul 27, 2007 at 11:51:56AM +0400, Alex Tomas wrote:
> >Secondly, apart from delalloc, XFS cannot use the generic code paths
> >for writeback because unwritten extent conversion also requires
> >custom I/O completion handlers. Given that __mpage_writepage() only
> >calls ->writepage when it is confused, XFS simply cannot use this
> >API.
> 
> this doesn't mean fs/mpage.c should go, right?

mpage.c read side is fine for every block based filesystem I know.
mpage.c write side is fine for every simple (non-delalloc, non-unwritten
extent, etc) filesystem.  So it surely shouldn't go.

> I didn't say "generic", see Subject: :)

then it shouldn't be in generic code.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-28 Thread Christoph Hellwig
On Fri, Jul 27, 2007 at 03:07:14PM +1000, David Chinner wrote:
> > It duplicates fs/mpage.c in bio building and introduces new generic API
> > (iomap, map_blocks_t, etc).
> 
> Using a new API for new functionality is a bad thing?

Depends on wht you do.  This patch is just a quickhack to shoe-horn
delalloc support into ext4.  Introducing a new abstraction is overkill.
If we really want an overhaul of the writeback path that's extent-aware,
and efficient for delalloc and unwritten extents introducing a proper
iomap-like data structure would make sense.  That beeing said I personally
hate the ubffer_head abuse for bmap data that we have in various places
as it's utterly confusing and wasting stack space, but that's a different
discussion.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-28 Thread Christoph Hellwig
On Thu, Jul 26, 2007 at 06:32:56AM -0400, Jeff Garzik wrote:
> Is this based on Christoph's work?
> 
> Christoph, or some other XFS hacker, already did generic delalloc, 
> modeled on the XFS delalloc code.

This is not based on my attempt to make the xfs writeout path generic.
Alex's variant is a lot simpler and thus missed various bits required
for high sustained writeout performance or xfs functionality.

That doesn't mean I want to arge against Alex's code although I'd of
course be more happy if we could actually shared code between multiple
filesystems.

Of ourse the code in it's current form should not go into mpage.c but
rather into ext4 so that it doesn't bloat the kernel for everyone.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

Jeff Garzik wrote:

Alex Tomas wrote:

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this "generic" delayed allocation code.


I didn't say "generic", see Subject: :)


Well, it shouldn't even be in the VFS layer if it's only usable by one 
filesystem.


sorry, but it seems I can say the same about iomap/ioend. I think
mpage_da_writepages() is simple enough to be adopted by other
filesystem, ext2 for example.

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

David Chinner wrote:

Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present "normal looking" buffers to the generic code
path.


how do you implement fsync(2) ? you'd have to wait such IO to complete,
then update the inode and write it through the log?


Also, looking at the way mpage_da_map_blocks() is done - if we have
an 128MB delalloc extent - ext4 will allocate that will allocate it
in one go, right? What happens if we then crash after only writing a
few megabytes of that extent? stale data exposure? XFS can allocate
multiple gigabytes in a single get_blocks call so even if ext4 can't
do this, it's a problem for XFS.


I just realized that you're talking about data=ordered mode in ext4,
where care is taken to prevent on-disk references to no-yet-written
blocks. The solution is to wait such IO to complete before metadata
commit. And the key thing here is to allocate and attach to inode
blocks we're writing immediately. IOW, there is no unwritten blocks
attached to inode (except fallocate(2) case), but there may be blocks
preallocated for this inode in-core. same gigabytes, but different
way ;)

I have no single objection to custom IO completion callback per
mpage_writepages().


thanks, Alex


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Jeff Garzik

Alex Tomas wrote:

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this "generic" delayed allocation code.


I didn't say "generic", see Subject: :)


Well, it shouldn't even be in the VFS layer if it's only usable by one 
filesystem.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

David Chinner wrote:

Using a new API for new functionality is a bad thing?


if existing API can be used ...


No, it doesn't provide the same functionality.

Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present "normal looking" buffers to the generic code
path.


good point, I was going to take care of it in a separate patch
to support data=ordered.


Secondly, apart from delalloc, XFS cannot use the generic code paths
for writeback because unwritten extent conversion also requires
custom I/O completion handlers. Given that __mpage_writepage() only
calls ->writepage when it is confused, XFS simply cannot use this
API.


this doesn't mean fs/mpage.c should go, right?


Also, looking at the way mpage_da_map_blocks() is done - if we have
an 128MB delalloc extent - ext4 will allocate that will allocate it
in one go, right? What happens if we then crash after only writing a
few megabytes of that extent? stale data exposure? XFS can allocate
multiple gigabytes in a single get_blocks call so even if ext4 can't
do this, it's a problem for XFS.


what happens if IO to 2nd MB is completed, while IO to 1st MB is not
(probably sitting in queue) ? do you update on-disk size in this case?
how do you track this?


So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this "generic" delayed allocation code.


I didn't say "generic", see Subject: :)

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread David Chinner
[please don't top post!]

On Thu, Jul 26, 2007 at 05:33:08PM +0400, Alex Tomas wrote:
> Jeff Garzik wrote:
> >The XFS one is proven and the work was already completed.
> >
> >What were the specific technical issues that made it unsuitable for ext4?
> >
> >I would rather not reinvent the wheel, particularly if the reinvention 
> >is less capable than the existing work.
>
> It duplicates fs/mpage.c in bio building and introduces new generic API
> (iomap, map_blocks_t, etc).

Using a new API for new functionality is a bad thing?

> In contrast, my trivial implementation re-use
> existing code in fs/mpage.c, doesn't introduce new API and I tend to think
> provides quite the same functionality. I can be wrong, of course ...

No, it doesn't provide the same functionality.

Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present "normal looking" buffers to the generic code
path.

Secondly, apart from delalloc, XFS cannot use the generic code paths
for writeback because unwritten extent conversion also requires
custom I/O completion handlers. Given that __mpage_writepage() only
calls ->writepage when it is confused, XFS simply cannot use this
API.

Also, looking at the way mpage_da_map_blocks() is done - if we have
an 128MB delalloc extent - ext4 will allocate that will allocate it
in one go, right? What happens if we then crash after only writing a
few megabytes of that extent? stale data exposure? XFS can allocate
multiple gigabytes in a single get_blocks call so even if ext4 can't
do this, it's a problem for XFS.

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this "generic" delayed allocation code.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Alex Tomas

It duplicates fs/mpage.c in bio building and introduces new generic API
(iomap, map_blocks_t, etc). In contrast, my trivial implementation re-use
existing code in fs/mpage.c, doesn't introduce new API and I tend to think
provides quite the same functionality. I can be wrong, of course ...

thanks, Alex

Jeff Garzik wrote:

The XFS one is proven and the work was already completed.

What were the specific technical issues that made it unsuitable for ext4?

I would rather not reinvent the wheel, particularly if the reinvention 
is less capable than the existing work.


Jeff





-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Jeff Garzik

Alex Tomas wrote:

Jeff Garzik wrote:

Is this based on Christoph's work?

Christoph, or some other XFS hacker, already did generic delalloc, 
modeled on the XFS delalloc code.


nope, this one is simple (something I'd prefer for ext4).


The XFS one is proven and the work was already completed.

What were the specific technical issues that made it unsuitable for ext4?

I would rather not reinvent the wheel, particularly if the reinvention 
is less capable than the existing work.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Aneesh Kumar K.V



Alex Tomas wrote:

Good day,

please review ...

thanks, Alex


basic delayed allocation in VFS:

 * block_prepare_write() can be passed special ->get_block() which
   doesn't allocate blocks, but reserve them and mark bh delayed
 * a filesystem can use mpage_da_writepages() with other ->get_block()
   which doesn't defer allocation. mpage_da_writepages() finds all
   non-allocated blocks and try to allocate them with minimal calls
   to ->get_block(), then submit IO using __mpage_writepage()



I missed this patch when looking at the ext4 patches. Can we mark related
patch as [ PATCH 1/2 ] so that we know that another patch is going to follow.

-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Alex Tomas

Jeff Garzik wrote:

Is this based on Christoph's work?

Christoph, or some other XFS hacker, already did generic delalloc, 
modeled on the XFS delalloc code.


nope, this one is simple (something I'd prefer for ext4).

thanks, Alex


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-26 Thread Jeff Garzik

Alex Tomas wrote:

Good day,

please review ...

thanks, Alex


basic delayed allocation in VFS:

 * block_prepare_write() can be passed special ->get_block() which
   doesn't allocate blocks, but reserve them and mark bh delayed
 * a filesystem can use mpage_da_writepages() with other ->get_block()
   which doesn't defer allocation. mpage_da_writepages() finds all
   non-allocated blocks and try to allocate them with minimal calls
   to ->get_block(), then submit IO using __mpage_writepage()


Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>


Is this based on Christoph's work?

Christoph, or some other XFS hacker, already did generic delalloc, 
modeled on the XFS delalloc code.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html