On Tue, 2005-04-19 at 04:22, Nikita Danilov wrote: > Badari Pulavarty <[EMAIL PROTECTED]> writes: > > [...] > > > > > Yes. Its possible to do what you want to. I am currently working on > > adding "delayed allocation" support to ext3. As part of that, We > > As you most likely already know, Alex Thomas already implemented delayed > block allocation for ext3.
Yep. I reviewed Alex Thomas patches for delayed allocation. He handled all the cases in his code and did NOT use any mpage* routines to do the work. I was hoping to change the mpage infrastructure to handle these, so that every filesystem doesn't have to do their thing. > > > > > In order to do the correct accounting, we need to mark a page > > to indicate if we reserved a block or not. One way to do this, > > to use page->private to indicate this. But then, all the generic > > I believe one can use PG_mappedtodisk bit in page->flags for this > purpose. There was old Andrew Morton's patch that introduced new bit > (PG_delalloc?) for this purpose. That would be good. But I don't feel like asking for a bit in page if there is a way to get around it. > > > routines will fail - since they assume that page->private represents > > bufferheads. So we need a better way to do this. > > They are not generic then. Some file systems store things completely > different from buffer head ring in page->private. Yep. Instead of changing the whole world, I was hoping to come up with few common interfaces (which doesn't assume anything about bufferheads etc..) which are useful for more than one filesystem. > > > > 3) We need add hooks into filesystem specific calls from these > > generic routines to handle "journaling mode" requirements > > (for ext3 and may be others). > > Please don't. There is no such thing as "generic > journalling". Traditional WAL used by ext3, phase-trees of Tux2, and > wandering logs of reiser4 are so much different that there is no hope > for a single API to accommodate them all. Adding such API will only > force more workarounds and hacks in non-ext3 file systems. > > What _is_ common to all journalling file systems on the other hand, is > the notion of transaction as the natural unit of caching and > write-out. Currently in Linux, write-out is inode-based > (->writepages()). Reiser4 already has a patch that replaces > sync_sb_inodes() function with super-block operation. In reiser4 case, > this operation scans the list of transactions (instead of the list of > inodes) and writes some of them out, which is natural thing to do for a > journalled file system. > > Similarly, transaction is a unit of caching: it's often necessary to > scan all pages of a given transaction, all dirty pages of a given > transaction, or to check whether given page belongs to a given > transaction. That is, transaction plays role similar to struct > address_space. But currently there is 1-to-1 relation between inodes and > address_spaces, and this forces file system to implement additional data > structures to duplicate functionality already present in address_space. > > > > So, what are your requirements ? I am looking for a common > > way to combine all the requirements and come out with a > > saner "generic" routines to handle these. > > > > I think that one reasonable way to add generic support for journalling > is to split struct address_space into two objects: lower layer that > represents "file" (say, struct vm_file), in which pages are linearly > ordered, and on top of this vm_cache (representing transaction) that > keeps track of pages from various vm_file's. vm_file is embedded into > inode, and vm_cache has a pointer to (the analog of) struct > address_space_operations. > > vm_cache's are created by file system back-end as necessary (can be > embedded into inode for non-journalled file systems). VM scanner and > balance_dirty_pages() call vm_cache operations to do write-out. Need to think some more. I guess you thought about this more than you do :) Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html