Hi,

On Fri, 29 Oct 1999 14:06:24 -0400 (EDT), Ingo Molnar <[EMAIL PROTECTED]>
said:

> On Fri, 29 Oct 1999, Stephen C. Tweedie wrote:

>> Fixing this in raid seems far, far preferable to fixing it in the
>> filesystems.  The filesystem should be allowed to use the buffer cache
>> for metadata and should be able to assume that there is a way to prevent
>> those buffers from being written to disk until it is ready.

> why dont you lock the buffer during transaction, thats the right way to
> 'pin' down a buffer and prevent it from being written out. You can keep a
> buffer locked && dirty indefinitely, and it should be easy to unlock them
> when comitting a transaction. Am i missing something? 

No, that's completely inappropriate: locking the buffer indefinitely
will simply cause jobs like dump() to block forever, for example.
However, you're missing a much more important issue: not all writes go
through the buffer cache.

Currently, swapping bypasses the buffer cache entirely: writes from swap
go via temporary buffer_heads to ll_rw_block.  The buffer_heads are
never part of the buffer cache and are discarded as soon as IO is
complete.  The same mechanism is used when reading to the page cache,
but that's probably safe enough as writes do use the buffer cache in
2.2.

In 2.3 the situation is much worse, as _all_ ext2 file writes bypass the
buffer cache.  The buffer_heads do persist, but they overlay the page
cache, not the buffer cache --- they do not appear on the buffer cache
hash lists.  You _cannot_ synchronise with these writes at the buffer
cache level.  If your raid resync collides with such a write, it is
entirely possible that the filesystem write will occur between the raid
read and the raid write --- you will corrupt ext2 files.

In my own case right now, ext3 on 2.2 behaves much like ext2 does on 2.3
--- it uses temporary buffer_heads to write directly to the journal, and
so is going to be bitten by the raid resync behaviour.

Basically, device drivers cannot assume that all IOs come from the
buffer cache --- raid has to work at the level of ll_rw_block, not at
the level of the buffer cache.

--Stephen

Reply via email to