Chris Mason wrote:
> 
> On 7/30/00, 7:14:16 AM, Daniel Phillips
> <[EMAIL PROTECTED]> wrote regarding Re: Questions about
> the buffer+page cache in 2.4.0:
> 
> > After digging a little deeper I can see that using the read actor
> > won't work because the read actor doesn't take the inode, or 
> > anything that can be dereferenced to find the inode, as a parameter.  
> > So it's not possible to do the tail offset check and adjustment 
> > there.
> >
> > That's ok - it's the wrong place to do it anyway because the check 
> > then has to be performed each time around the loop.  A much better 
> > way is to replace generic_file_read in the Ext2 file_operations 
> > struct by a new ext2_file_read:
> 
> > proposed_ext2_file_read:
> >   - generic_file_read stopping before any tail with nonzero offset
> >   - If necessary, generic_file_read of the tail with source offset
> 
> For reading the tail, take a look at how these functions interact:
> 
> get_block
> generic_file_read
> block_read_full_page (ext2's readpage func)
> 
> Putting the tail knowledge into ext2_file_read won't be enough, it 
> won't cover mmaps.  You have to make sure your readpage/writepage 
> functions keep the page and buffer caches in sync.  Reiserfs does 
> most of this from get_block...

Yes, exactly.  To be able to reach that conclusion requires a detailed
understanding of the way the file mapping mechanism works, which I puzzled out
yesterday *after* I wrote the previous post.  Tomorrow I'll try to post the
relevant details show how they lead to the design you just suggested.  Clearly
Stephen and Alex both had the same thing in mind, and were merely arguing about
the subtle details of the implementation.

> > Now I have to address the question of how tail blocks can 
> > be shared between files...
> 
> You have two real choices.  Unpack the tail before any writes to it, then
> repack later (file close, whatever).  This allows you to use all the
> generic functions for writing data, and keeps the synchronization down
> (only write to shared data on pack/unpack).

Yes, that was the preferred design right from the beginning even before I
started thinking of it in terms of page cache operations vs buffer cache.

> Or, change your prepare, commit, writepage, and get_block routines to
> write directly to the shared block.  This is somewhat more difficult, and
> I suspect slower since you'll have to touch all the inodes in the ring as
> you shift data around for each write.

Ick - I think we agree on which is best.  Incidently, since the ring is now
defined to be double-linked, you will only touch two inodes besides the one
you're writing to.  Furthermore, if those two inodes happen to be on the same
inode page (8 inodes/page in a 1K/block fs; 32 with 4K) then you don't even have
to hit the disk an extra time.  The tailmerging algorithm could easily be
optimized to favor this arrangement if desired.

I see we're now at the point that Alex warned me about - the change-of-state
when a merged tail on the page has to be unmerged due to a write further up the
file.

-- 
Daniel

Reply via email to