Re: Raid resync changes buffer cache semantics --- not good for journaling!

Theodore Y. Ts'o Tue, 2 Nov 1999 16:10:20 -0800
   From: "Stephen C. Tweedie" <[EMAIL PROTECTED]>
   Date:   Tue, 2 Nov 1999 17:44:55 +0000 (GMT)

   Ask Linus, he's pushing this point much more strongly than I am!  The
   buffer cache will become less and less of a cache as time goes on in his
   grand plan: it is to become little more than an IO buffer layer.

Ultimately, I think may be better off if we remove any hint of caching
from the I/O buffer layer.  The cache coherency issues between the page
and buffer cache make me nervous, and I'm not completely 100% convinced
we got it all right.  (I'm wondering if some of the ext2 corruption
reports in the 2.2 kernels are coming from a buffer cache/page cache
corruption.)

This means putting filesystem meta-data into the page cache.  Yes, I
know Stephen has some concerns about doing this because the big memory
patches mean pages in the page cache might not be directly accessible by
the kernel.  I see two solutions to this, both with drawbacks.  One is
to use a VM remap facility to map directories, superblocks, inode tables
etc. into the kernel address space.  The other is to have flags which
ask the kernel to map filesystem metadtata into part of the page cache
that's addressable by the kernel.  The first adds a VM delay to
accessing the filesystem metadata, and the other means we need to manage
the part of the page cache that's below 2GB differently from the page
cache in high memory at least as far as freeing pages in response to
memory pressure is concerned.

   Basically, for the raid code to poke around in higher layers is a huge
   layering violation.  We are heading towards doing things like adding
   kiobuf interfaces to ll_rw_block (in which the IO descriptor that the
   driver receives will have no reference to the buffer cache), and 
   and raw, unbuffered access to the drivers for raw devices and O_DIRECT.
   Raw IO is already there and bypasses the buffer cache.  So does swap.
   So does journaling.  So does page-in (in 2.2) and page-out (in 2.3).

It'll be interesting to see how this affects using dump(8) on a mounted
filesystem.  This was never particularly guaranteed to give a coherent
filesystem image, but what with increasing bypass of the buffer cache,
it may make the results of using dump(8) on a live filesystem even
worse.

One way of solving this is to add some kernel support for dump(8); for
example, the infamous iopen() call which Linus hates so much.  (Yes, it
violates the Unix permission model, which is why it needs to be
restricted to root, and yes, it won't work on all filesystems; just
those that have inodes.)  The other is to simply tell people to give up
on dump completely, and just use a file-level tool such as tar or bru.

                                                        - Ted
Re: Raid resync changes buffer cache semantics --- not good for journaling!

Reply via email to