On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote: > Chris Mason wrote: > > On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote: > > > >> Ric Wheeler wrote: > >> > >>> I think that we do handle a failure in the case that you outline above > >>> since the FS will be able to notice the error before it sends a commit > >>> down (and that commit is wrapped in the barrier flush calls). This is > >>> the easy case since we still have the context for the IO. > >>> > >> I'm no FS guy but for that to be true FS should be waiting for all the > >> outstanding IOs to finish before issuing a barrier and actually > >> doesn't need barriers at all - it can do the same with flush_cache. > >> > >> > > > > We wait and then barrier. If the barrier returned status that a > > previously ack'd IO had actually failed, we could do something to make > > sure the FS was consistent. > > > As I mentioned in a reply to Tejun, I am not sure that we can count on > the barrier op giving us status for IO's that failed to destage cleanly. > > Waiting and then doing the FLUSH seems to give us the best coverage for > normal failures (and your own testing shows that it is hugely effective > in reducing some types of corruption at least :-)). > > If you look at the types of common drive failures, I would break them > into two big groups. > > The first group would be transient errors - i.e., this IO fails (usually > a read), but a subsequent IO will succeed with or without a sector > remapping happening. Causes might be: > > (1) just a bad read due to dirt on the surface of the drive - the > read will always fail, a write might clean the surface and restore it to > useful life. > (2) vibrations (dropping your laptop, rolling a big machine down the > data center, passing trains :-)) > (3) adjacent sector writes - hot spotting on drives can degrade the > data on adjacent tracks. This causes IO errors on reads for data that > was successfully written before, but the track itself is still perfectly > fine. >
4) Transient conditions such as heat or other problems made the drive give errors. Combine your matrix with the single drive install vs the mirrored configuration and we get a lot of variables. What I'd love to have is a rehab tool for drives that works it over and decides if it should stay or go. It is somewhat difficult to run the rehab on a mounted single disk install, but we can start with the multi-device config and work out way out from there. For barrier flush, io errors reported back by the barrier flush would allow us to know when corrective action was required. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html