In message <[EMAIL PROTECTED]>, Ion Badulescu writes:
[...]
> The current implementation will also populate the page cache with pages
> that are not Uptodate, but are not Locked either, which is clearly a bug.
> It will always happen if there is a partial write to a page, e.g. if a
> program creates a file and then writes 1.5k worth of data, on a 1k-block
> filesystem.
>
> It should be fixed either by getting all the buffers within the page
> Uptodate, or by throwing away the page at the end of the write operation.
>
>
> Ion
Right. This messed up our stacking code a bit. The VFS essentially does
this in generic_file_read:
read_cache_page()
wait_on_page()
if (!Page_uptodate()) {
report an error
}
Since our stacking behaves like a VFS, we have to reproduce the above code
in our readpage(). For some file systems, such as cryptfs, there are two
pages in memory for each normal page: one ciphertext and one cleartext. But
now we have a problem in the following scenario:
(1) you copy a file through the lower level file system (ext2) which has a
1k block size for a 4k page size (intel)
(2) the file you copy isn't an exact multiple of PAGE_CACHE_SIZE
(3) the caching at the ext2 level will put the last page in the cache, with
only some of the buffers being BH_Uptodate. This is
fs/buffer.c:__block_commit_write() which ext2 uses. The code in that
function will not set the page uptodate flag on partial pages that only
have a few buffers uptodate. So now we have a page in the cache that is
not up-to-date. Now see what happens when our stacking layer executes
the code similar to generic_file_read() in our own readpage():
read_cache_page(lower_page) -> we find it
wait_on_page(lower_page) -> page not locked, no more wait
if (!Page_uptodate(lower-page)) { -> page is NOT uptodate
report an error -> we flag an error
}
There is no way for us to fix the problem in our stacking code b/c we cannot
distinguish b/t a page that is truly not up-to-date, and a partial page such
as the last page of a file just written.
Note also that there is no problem if the file is written *through* our
stacking layer, b/c then we can force the up-to-date flag on the cached
pages. There is also no a problem if we read a file which was not cached at
the ext2 level, and we read it through our stacked layer; in that case, the
page comes back up-to-date and we're ok. It's only when
__block_commit_write() runs that we have a problem.
Summary: a page should not be in the cache and not be up-to-date. If it is
not up-to-date, then it should also probably be locked, but only b/c it is
probably in transit from the disk to the cache.
We have tested the patch included here, which tries to ensure that no pages
are in the cache and not up-to-date; it fixes __block_prepare_write(). Can
someone who knows the buffer.c code well comment on this? I can't see a way
out of this situation without one of the following:
- no partial pages are left in the cache (our patch)
- partial pages are put in the cache if they are the last page of a file,
but they are then marked up-to-date, and the rest of the code changed to
handle this special situation
- a new flag added to pages that indicates that the page is partial (last
page of a file) *and* up-to-date. That way, everyone can write code that
depends on handles this situation.
Thanks,
Erez.
diff -ruN linux-2.3.99-pre3-vanilla/fs/buffer.c linux-2.3.99-pre3-fist/fs/buffer.c
--- linux-2.3.99-pre3-vanilla/fs/buffer.c Tue Mar 21 14:30:08 2000
+++ linux-2.3.99-pre3-fist/fs/buffer.c Mon Apr 3 08:59:11 2000
@@ -1448,10 +1448,6 @@
if (!bh)
BUG();
block_end = block_start+blocksize;
- if (block_end <= from)
- continue;
- if (block_start >= to)
- break;
bh->b_end_io = end_buffer_io_sync;
if (!buffer_mapped(bh)) {
err = get_block(inode, block, bh, 1);
@@ -1459,10 +1455,15 @@
goto out;
if (buffer_new(bh)) {
unmap_underlying_metadata(bh);
- if (block_end > to)
- memset(kaddr+to, 0, block_end-to);
- if (block_start < from)
- memset(kaddr+block_start, 0, from-block_start);
+ if (block_end <= from || block_start >= to)
+ memset(kaddr+block_start, 0, block_end);
+ else {
+ if (block_end > to)
+ memset(kaddr+to, 0, block_end-to);
+ if (block_start < from)
+ memset(kaddr+block_start, 0,
+from-block_start);
+ }
+ mark_buffer_uptodate(bh, 1);
continue;
}
}