Mingming Cao <[EMAIL PROTECTED]> writes:

> On Tue, 2005-04-19 at 19:55 +0400, Nikita Danilov wrote:
>> Badari Pulavarty <[EMAIL PROTECTED]> writes:
>> 
>> > On Tue, 2005-04-19 at 04:22, Nikita Danilov wrote:
>> >> Badari Pulavarty <[EMAIL PROTECTED]> writes:
>> >> 
>> >> [...]
>> >> 
>> >> >
>> >> > Yes. Its possible to do what you want to. I am currently working on
>> >> > adding "delayed allocation" support to ext3. As part of that, We
>> >> 
>> >> As you most likely already know, Alex Thomas already implemented delayed
>> >> block allocation for ext3.
>> >
>> > Yep. I reviewed Alex Thomas patches for delayed allocation. He handled
>> > all the cases in his code and did NOT use any mpage* routines to do
>> > the work. I was hoping to change the mpage infrastructure to handle
>> > these, so that every filesystem doesn't have to do their thing.
>> >
>> 
>> Just keep in mind that filesystem != ext3. :-) Generic support makes
>> sense only when it is usable by multiple file systems. This is not
>> always possible, e.g., there is no "generic block allocator" for
>> precisely the same reason: disk space allocation policies are tightly
>> intertwined with the rest of file system internals.
>> 
>
> This generic support should be useful for ext2 and xfs. From delayed

But it won't work for reiser4, that allocates blocks _across_ multiple
files. E.g., if many files were created in the same directory,
allocation (performed just before write-out) will assign block numbers
so that files are ordered according to the readdir order on the disk
(with each file body being an interval in that ordering). This is done
by arranging all dirty blocks of a given transaction according to some
"ideal" ordering and then trying to map this ordering onto disk blocks.

As you see, in this case allocation is not done on inode-by-inode basis
at all: instead delayed allocation is done at the transaction level of
granularity, and I am trying to point out that this is natural thing for
the journalled file system to do.

The same goes for write-out: in ext3 there is only one "active"
transaction at any moment, and this means that ->writepages() calls can
go in arbitrary order, but for the file system type with multiple active
transactions that can be committed separately, order of ->writepages()
calls has to follow ordering between transactions. Again, this means
that write-out should be transaction rather than inode based.

If we want really generic support for journalling and
delayed-allocation, mpage_* functions are the wrong level. Instead
proper notion of transaction has to be introduced, and file system IO
and disk space allocation interfaces adjusted appropriately.

Nikita.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to