Erez Zadok <[EMAIL PROTECTED]> writes:

> In message <[EMAIL PROTECTED]>, Ion Badulescu writes:
> 
> [...]
> > The current implementation will also populate the page cache with pages
> > that are not Uptodate, but are not Locked either, which is clearly a bug.
> > It will always happen if there is a partial write to a page, e.g. if a
> > program creates a file and then writes 1.5k worth of data, on a 1k-block
> > filesystem.
> > 
> > It should be fixed either by getting all the buffers within the page
> > Uptodate, or by throwing away the page at the end of the write operation.
> > 
> > 
> > Ion
> 
> Right.  This messed up our stacking code a bit.  The VFS essentially does
> this in generic_file_read:
> 
>       read_cache_page()
>       wait_on_page()
>       if (!Page_uptodate()) {
>               report an error
>       }

Nope.   generic_file_read does essentially:

read_cache_page();
wait_on_page();
if (!Page_Uptodate()) {
        mapping->a_ops->readpage()
        wait_on_page();
        if (!Page_Uptodate()) {
                report an error;
        }
}

This is important as it greatly speeds of the case of writing
partial pages. It eliminates the read/modify write case, if possible.

For block based filesystems if we aren't writing on block boundaries
we probably shouldn't mess with it and do the read/modify/write, if
we aren't writing on a block boundary.

For network filesystems with no block alignment constraints, and much
greater latency the win is real.

> 
> Since our stacking behaves like a VFS, we have to reproduce the above code
> in our readpage().  For some file systems, such as cryptfs, there are two
> pages in memory for each normal page: one ciphertext and one cleartext.  But
> now we have a problem in the following scenario:
> 
> (1) you copy a file through the lower level file system (ext2) which has a
>     1k block size for a 4k page size (intel)
> 
> (2) the file you copy isn't an exact multiple of PAGE_CACHE_SIZE
> 
> (3) the caching at the ext2 level will put the last page in the cache, with
>     only some of the buffers being BH_Uptodate.  This is
>     fs/buffer.c:__block_commit_write() which ext2 uses.  The code in that
>     function will not set the page uptodate flag on partial pages that only
>     have a few buffers uptodate.  So now we have a page in the cache that is
>     not up-to-date.  Now see what happens when our stacking layer executes
>     the code similar to generic_file_read() in our own readpage():

O.k.  This case looks like a performance bug. 
Worth fixing but it should not prevent correct operations.

> Summary: a page should not be in the cache and not be up-to-date.  If it is
> not up-to-date, then it should also probably be locked, but only b/c it is
> probably in transit from the disk to the cache.

Nope.

> We have tested the patch included here, which tries to ensure that no pages
> are in the cache and not up-to-date; it fixes __block_prepare_write().  Can
> someone who knows the buffer.c code well comment on this?  I can't see a way
> out of this situation without one of the following:

> 
> - partial pages are put in the cache if they are the last page of a file,
>   but they are then marked up-to-date, and the rest of the code changed to
>   handle this special situation

This should be done, but just because it is a performance optimization.
This case has been done for years, and works fine.

Eric

Reply via email to