In message <[EMAIL PROTECTED]>, Ion Badulescu writes:
[...]
> The current implementation will also populate the page cache with pages
> that are not Uptodate, but are not Locked either, which is clearly a bug.
> It will always happen if there is a partial write to a page, e.g. if a
> program creates a file and then writes 1.5k worth of data, on a 1k-block
> filesystem.
> 
> It should be fixed either by getting all the buffers within the page
> Uptodate, or by throwing away the page at the end of the write operation.
> 
> 
> Ion

Right.  This messed up our stacking code a bit.  The VFS essentially does
this in generic_file_read:

        read_cache_page()
        wait_on_page()
        if (!Page_uptodate()) {
                report an error
        }

Since our stacking behaves like a VFS, we have to reproduce the above code
in our readpage().  For some file systems, such as cryptfs, there are two
pages in memory for each normal page: one ciphertext and one cleartext.  But
now we have a problem in the following scenario:

(1) you copy a file through the lower level file system (ext2) which has a
    1k block size for a 4k page size (intel)

(2) the file you copy isn't an exact multiple of PAGE_CACHE_SIZE

(3) the caching at the ext2 level will put the last page in the cache, with
    only some of the buffers being BH_Uptodate.  This is
    fs/buffer.c:__block_commit_write() which ext2 uses.  The code in that
    function will not set the page uptodate flag on partial pages that only
    have a few buffers uptodate.  So now we have a page in the cache that is
    not up-to-date.  Now see what happens when our stacking layer executes
    the code similar to generic_file_read() in our own readpage():

        read_cache_page(lower_page)     -> we find it
        wait_on_page(lower_page)        -> page not locked, no more wait
        if (!Page_uptodate(lower-page)) {       -> page is NOT uptodate
                report an error         -> we flag an error
        }

There is no way for us to fix the problem in our stacking code b/c we cannot
distinguish b/t a page that is truly not up-to-date, and a partial page such
as the last page of a file just written.

Note also that there is no problem if the file is written *through* our
stacking layer, b/c then we can force the up-to-date flag on the cached
pages.  There is also no a problem if we read a file which was not cached at
the ext2 level, and we read it through our stacked layer; in that case, the
page comes back up-to-date and we're ok.  It's only when
__block_commit_write() runs that we have a problem.

Summary: a page should not be in the cache and not be up-to-date.  If it is
not up-to-date, then it should also probably be locked, but only b/c it is
probably in transit from the disk to the cache.

We have tested the patch included here, which tries to ensure that no pages
are in the cache and not up-to-date; it fixes __block_prepare_write().  Can
someone who knows the buffer.c code well comment on this?  I can't see a way
out of this situation without one of the following:

- no partial pages are left in the cache (our patch)

- partial pages are put in the cache if they are the last page of a file,
  but they are then marked up-to-date, and the rest of the code changed to
  handle this special situation

- a new flag added to pages that indicates that the page is partial (last
  page of a file) *and* up-to-date.  That way, everyone can write code that
  depends on handles this situation.

Thanks,
Erez.

diff -ruN linux-2.3.99-pre3-vanilla/fs/buffer.c linux-2.3.99-pre3-fist/fs/buffer.c
--- linux-2.3.99-pre3-vanilla/fs/buffer.c       Tue Mar 21 14:30:08 2000
+++ linux-2.3.99-pre3-fist/fs/buffer.c  Mon Apr  3 08:59:11 2000
@@ -1448,10 +1448,6 @@
                if (!bh)
                        BUG();
                block_end = block_start+blocksize;
-               if (block_end <= from)
-                       continue;
-               if (block_start >= to)
-                       break;
                bh->b_end_io = end_buffer_io_sync;
                if (!buffer_mapped(bh)) {
                        err = get_block(inode, block, bh, 1);
@@ -1459,10 +1455,15 @@
                                goto out;
                        if (buffer_new(bh)) {
                                unmap_underlying_metadata(bh);
-                               if (block_end > to)
-                                       memset(kaddr+to, 0, block_end-to);
-                               if (block_start < from)
-                                       memset(kaddr+block_start, 0, from-block_start);
+                               if (block_end <= from || block_start >= to)
+                                       memset(kaddr+block_start, 0, block_end);
+                               else {
+                                       if (block_end > to)
+                                               memset(kaddr+to, 0, block_end-to);
+                                       if (block_start < from)
+                                               memset(kaddr+block_start, 0, 
+from-block_start);
+                               }
+                               mark_buffer_uptodate(bh, 1);
                                continue;
                        }
                }

Reply via email to