Ric Wheeler wrote:
> I think that we do handle a failure in the case that you outline above
> since the FS will be able to notice the error before it sends a commit
> down (and that commit is wrapped in the barrier flush calls). This is
> the easy case since we still have the context for the IO.

I'm no FS guy but for that to be true FS should be waiting for all the
outstanding IOs to finish before issuing a barrier and actually
doesn't need barriers at all - it can do the same with flush_cache.

> It is more challenging  (and kind of related) if the IO done in (4) has
> been ack'ed by drive, the drive later destages (not as part of the
> flush) its write cache and then an error happens. In this case, there is
> nothing waiting on the initiator side to receive the IO error. We have
> effectively lost the context for that IO.

IIUC, that should be detectable from FLUSH whether the destaging
occurred as part of flush or not, no?

> The only way to detect this is on replay (if the journal has checksums
> enabled or the error will be flagged as a media error).

If it's not reported on FLUSH, it basically amounts to silent data
corruption and only checksums can help.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to