Re: Lazy block allocation and block_prepare_write?

Badari Pulavarty Tue, 19 Apr 2005 07:58:43 -0700

On Tue, 2005-04-19 at 04:22, Nikita Danilov wrote:
> Badari Pulavarty <[EMAIL PROTECTED]> writes:
> 
> [...]
> 
> >
> > Yes. Its possible to do what you want to. I am currently working on
> > adding "delayed allocation" support to ext3. As part of that, We
> 
> As you most likely already know, Alex Thomas already implemented delayed
> block allocation for ext3.


Yep. I reviewed Alex Thomas patches for delayed allocation. He handled
all the cases in his code and did NOT use any mpage* routines to do
the work. I was hoping to change the mpage infrastructure to handle
these, so that every filesystem doesn't have to do their thing.


> 
> >
> > In order to do the correct accounting, we need to mark a page
> > to indicate if we reserved a block or not. One way to do this,
> > to use page->private to indicate this. But then, all the generic
> 
> I believe one can use PG_mappedtodisk bit in page->flags for this
> purpose. There was old Andrew Morton's patch that introduced new bit
> (PG_delalloc?) for this purpose.

That would be good. But I don't feel like asking for a bit in page
if there is a way to get around it.

> 
> > routines will fail - since they assume that page->private represents
> > bufferheads. So we need a better way to do this.
> 
> They are not generic then. Some file systems store things completely
> different from buffer head ring in page->private.

Yep. Instead of changing the whole world, I was hoping to come up with
few common interfaces (which doesn't assume anything about bufferheads
etc..) which are useful for more than one filesystem.


> >
> > 3) We need add hooks into filesystem specific calls from these
> > generic routines to handle "journaling mode" requirements
> > (for ext3 and may be others).
> 
> Please don't. There is no such thing as "generic
> journalling". Traditional WAL used by ext3, phase-trees of Tux2, and
> wandering logs of reiser4 are so much different that there is no hope
> for a single API to accommodate them all. Adding such API will only
> force more workarounds and hacks in non-ext3 file systems.
> 
> What _is_ common to all journalling file systems on the other hand, is
> the notion of transaction as the natural unit of caching and
> write-out. Currently in Linux, write-out is inode-based
> (->writepages()). Reiser4 already has a patch that replaces
> sync_sb_inodes() function with super-block operation. In reiser4 case,
> this operation scans the list of transactions (instead of the list of
> inodes) and writes some of them out, which is natural thing to do for a
> journalled file system.
> 
> Similarly, transaction is a unit of caching: it's often necessary to
> scan all pages of a given transaction, all dirty pages of a given
> transaction, or to check whether given page belongs to a given
> transaction. That is, transaction plays role similar to struct
> address_space. But currently there is 1-to-1 relation between inodes and
> address_spaces, and this forces file system to implement additional data
> structures to duplicate functionality already present in address_space.
> >
> > So, what are your requirements ?  I am looking for a common
> > way to combine all the requirements and come out with a
> > saner "generic" routines to handle these.
> >
> 
> I think that one reasonable way to add generic support for journalling
> is to split struct address_space into two objects: lower layer that
> represents "file" (say, struct vm_file), in which pages are linearly
> ordered, and on top of this vm_cache (representing transaction) that
> keeps track of pages from various vm_file's. vm_file is embedded into
> inode, and vm_cache has a pointer to (the analog of) struct
> address_space_operations.
> 
> vm_cache's are created by file system back-end as necessary (can be
> embedded into inode for non-journalled file systems). VM scanner and
> balance_dirty_pages() call vm_cache operations to do write-out.

Need to think some more. I guess you thought about this more than you
do :)

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Lazy block allocation and block_prepare_write?

Reply via email to