Eric Anopolsky wrote:
On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote:
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
- power loss at any time must not corrupt the fs (atomic fs modification)
(new-data loss is acceptable)
Done. Btrfs already uses barriers as required for sata drives.
Aren't there situations in which write barriers don't do what they're
supposed to do?
Cheers,
Eric
If the drive effectively "lies" to you about flushing the write cache,
you might have an issue. I have not seen that first hand with recent
disk drives (and I have seen a lot :-))
That does not match the understanding I get from reading the
notes/caveats section of Documentation/block/barrier.txt:
"Note that block drivers must not requeue preceding requests while
completing latter requests in an ordered sequence. Currently, no
error checking is done against this."
and perhaps more importantly:
"[a technical scenario involving disk writes]
The problem here is that the barrier request is *supposed* to indicate
that filesystem update requests [2] and [3] made it safely to the
physical medium and, if the machine crashes after the barrier is
written, filesystem recovery code can depend on that. Sadly, that
isn't true in this case anymore. IOW, the success of a I/O barrier
should also be dependent on success of some of the preceding requests,
where only upper layer (filesystem) knows what 'some' is.
This can be solved by implementing a way to tell the block layer which
requests affect the success of the following barrier request and
making lower lever drivers to resume operation on error only after
block layer tells it to do so.
As the probability of this happening is very low and the drive should
be faulty, implementing the fix is probably an overkill. But, still,
it's there."
Cheers,
Eric
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the device
tried to write the cache back (say due to normal drive microcode cache
destaging). The problem with this is that there is no outstanding IO
context between the host and the storage to report the error to (i.e.,
the drive has already ack'ed the write).
If this is what is being described, there is a non-zero chance that this
might happen, but it is extremely infrequent. The checksumming that we
have in btrfs will catch these bad writes when you replay the journal
after a crash (or even when you read data blocks) so I would contend
that this is about as good as we can do.
Tejun, Chris, does this match your understanding?
Thanks!
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html