Badari Pulavarty <[EMAIL PROTECTED]> writes:

[...]

>
> Yes. Its possible to do what you want to. I am currently working on
> adding "delayed allocation" support to ext3. As part of that, We

As you most likely already know, Alex Thomas already implemented delayed
block allocation for ext3.

[...]

>
> In order to do the correct accounting, we need to mark a page
> to indicate if we reserved a block or not. One way to do this,
> to use page->private to indicate this. But then, all the generic

I believe one can use PG_mappedtodisk bit in page->flags for this
purpose. There was old Andrew Morton's patch that introduced new bit
(PG_delalloc?) for this purpose.

> routines will fail - since they assume that page->private represents
> bufferheads. So we need a better way to do this.

They are not generic then. Some file systems store things completely
different from buffer head ring in page->private.

>
> 3) We need add hooks into filesystem specific calls from these
> generic routines to handle "journaling mode" requirements
> (for ext3 and may be others).

Please don't. There is no such thing as "generic
journalling". Traditional WAL used by ext3, phase-trees of Tux2, and
wandering logs of reiser4 are so much different that there is no hope
for a single API to accommodate them all. Adding such API will only
force more workarounds and hacks in non-ext3 file systems.

What _is_ common to all journalling file systems on the other hand, is
the notion of transaction as the natural unit of caching and
write-out. Currently in Linux, write-out is inode-based
(->writepages()). Reiser4 already has a patch that replaces
sync_sb_inodes() function with super-block operation. In reiser4 case,
this operation scans the list of transactions (instead of the list of
inodes) and writes some of them out, which is natural thing to do for a
journalled file system.

Similarly, transaction is a unit of caching: it's often necessary to
scan all pages of a given transaction, all dirty pages of a given
transaction, or to check whether given page belongs to a given
transaction. That is, transaction plays role similar to struct
address_space. But currently there is 1-to-1 relation between inodes and
address_spaces, and this forces file system to implement additional data
structures to duplicate functionality already present in address_space.

>
> So, what are your requirements ?  I am looking for a common
> way to combine all the requirements and come out with a
> saner "generic" routines to handle these.
>

I think that one reasonable way to add generic support for journalling
is to split struct address_space into two objects: lower layer that
represents "file" (say, struct vm_file), in which pages are linearly
ordered, and on top of this vm_cache (representing transaction) that
keeps track of pages from various vm_file's. vm_file is embedded into
inode, and vm_cache has a pointer to (the analog of) struct
address_space_operations.

vm_cache's are created by file system back-end as necessary (can be
embedded into inode for non-journalled file systems). VM scanner and
balance_dirty_pages() call vm_cache operations to do write-out.

>
> Thanks,
> Badari

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to