Hi,

On Tue, 02 Nov 1999 08:43:01 -0700, [EMAIL PROTECTED] said:

>> Fixing this in raid seems far, far preferable to fixing it in the
>> filesystems.  The filesystem should be allowed to use the buffer cache
>> for metadata and should be able to assume that there is a way to prevent
>> those buffers from being written to disk until it is ready.

> What about doing it in the page cache: i.e. reserve pages for journaling
> and let them hit the buffer cache only when the transaction allows it?

> This may be a naive suggestion, but it looks logical.

The issue is one of IO.  Journaling builds a list of disk block updates
which need to be applied in a given order.  IO is block-based, not
page-based.  I can cache a directory in the page cache, but when I start
doing modifications, it's on a per-block basis because journaling has
got to record modified blocks.  I could quite easily end up with two
different blocks in the same page belonging to two different
transactions if the blocksize is less than the pagesize.

Doing filesystem caching in the page cache is fine, but it does not
really make sense as a data structure in which to build a transaction's
pending-write lists.

There's a second issue: I talked with Ingo about the journaling API
early on, and it was designed specifically to support buffer journaling.
I want to be able to allow the raid or LVM driver code to use the same
jfs API to apply transactional updates across multiple devices at the
block device level, for doing things like reconfiguring an array to
merge a new disk or to mark errors.

--Stephen

Reply via email to