Re: Questions about the buffer+page cache in 2.4.0

Chris Mason Mon, 31 Jul 2000 10:41:13 -0700


On 7/30/00, 7:14:16 AM, Daniel Phillips 
<[EMAIL PROTECTED]> wrote regarding Re: Questions about 
the buffer+page cache in 2.4.0:


> Daniel Phillips wrote:
> > There are two obvious ways to do filesystem-specific special handling of 
the
> > tail block: (1) in the 'read actor' that does the actual copy-to-user 
(see? yet
> > another problem solved by an extra level of indirection!) or (2) in the
> > inode->i_mapping->a_ops->readpage function that retrieves the page: 
essentially
> > we would do a copy-on-write to produce a new page that has the tail 
fragment in
> > the correct place.
> >
> > I have a very distinct preference for (1), and I'll proceed on the 
assumption
> > that that's what I'll be doing unless some shows me why it's bad.

> After digging a little deeper I can see that using the read actor won't 
work
> because the read actor doesn't take the inode, or anything that can be
> dereferenced to find the inode, as a parameter.  So it's not possible to 
do the
> tail offset check and adjustment there.

> That's ok - it's the wrong place to do it anyway because the check then 
has to
> be performed each time around the loop.  A much better way is to replace
> generic_file_read in the Ext2 file_operations struct by a new 
ext2_file_read:

> proposed_ext2_file_read:
>   - generic_file_read stopping before any tail with nonzero offset
>   - If necessary, generic_file_read of the tail with source offset

For reading the tail, take a look at how these functions interact:

get_block
generic_file_read
block_read_full_page (ext2's readpage func)


Putting the tail knowledge into ext2_file_read won't be enough, it won't 
cover mmaps.  You have to make sure your readpage/writepage functions 
keep the page and buffer caches in sync.  Reiserfs does most of this from 
get_block...

> This imposes just one extra check when the tail isn't merged or happens 
to be at
> the beginning of the tail block, so read overhead for tailmerging is 
negligible
> when the feature isn't used.

> Now I have to address the question of how tail blocks can be shared 
between
> files.  This does not seem to me to be an easy question at all.  I'll 
summarize
> the previous discussion here...

You have two real choices.  Unpack the tail before any writes to it, then 
repack later (file close, whatever).  This allows you to use all the 
generic functions for writing data, and keeps the synchronization down 
(only write to shared data on pack/unpack). 

Or, change your prepare, commit, writepage, and get_block routines to 
write directly to the shared block.  This is somewhat more difficult, and 
I suspect slower since you'll have to touch all the inodes in the ring as 
you shift data around for each write.

-chris
Re: Questions about the buffer+page cache in 2.4.0

Reply via email to