Mingming Cao <[EMAIL PROTECTED]> writes: > On Tue, 2005-04-19 at 19:55 +0400, Nikita Danilov wrote: >> Badari Pulavarty <[EMAIL PROTECTED]> writes: >> >> > On Tue, 2005-04-19 at 04:22, Nikita Danilov wrote: >> >> Badari Pulavarty <[EMAIL PROTECTED]> writes: >> >> >> >> [...] >> >> >> >> > >> >> > Yes. Its possible to do what you want to. I am currently working on >> >> > adding "delayed allocation" support to ext3. As part of that, We >> >> >> >> As you most likely already know, Alex Thomas already implemented delayed >> >> block allocation for ext3. >> > >> > Yep. I reviewed Alex Thomas patches for delayed allocation. He handled >> > all the cases in his code and did NOT use any mpage* routines to do >> > the work. I was hoping to change the mpage infrastructure to handle >> > these, so that every filesystem doesn't have to do their thing. >> > >> >> Just keep in mind that filesystem != ext3. :-) Generic support makes >> sense only when it is usable by multiple file systems. This is not >> always possible, e.g., there is no "generic block allocator" for >> precisely the same reason: disk space allocation policies are tightly >> intertwined with the rest of file system internals. >> > > This generic support should be useful for ext2 and xfs. From delayed
But it won't work for reiser4, that allocates blocks _across_ multiple files. E.g., if many files were created in the same directory, allocation (performed just before write-out) will assign block numbers so that files are ordered according to the readdir order on the disk (with each file body being an interval in that ordering). This is done by arranging all dirty blocks of a given transaction according to some "ideal" ordering and then trying to map this ordering onto disk blocks. As you see, in this case allocation is not done on inode-by-inode basis at all: instead delayed allocation is done at the transaction level of granularity, and I am trying to point out that this is natural thing for the journalled file system to do. The same goes for write-out: in ext3 there is only one "active" transaction at any moment, and this means that ->writepages() calls can go in arbitrary order, but for the file system type with multiple active transactions that can be committed separately, order of ->writepages() calls has to follow ordering between transactions. Again, this means that write-out should be transaction rather than inode based. If we want really generic support for journalling and delayed-allocation, mpage_* functions are the wrong level. Instead proper notion of transaction has to be introduced, and file system IO and disk space allocation interfaces adjusted appropriately. Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
