On Jan 9, 2014, at 3:42 AM, Hugo Mills <h...@carfax.org.uk> wrote: > On Thu, Jan 09, 2014 at 11:26:26AM +0100, Clemens Eisserer wrote: >> Hi, >> >> I am running write-intensive (well sort of, one write every 10s) >> workloads on cheap flash media which proved to be horribly unreliable. >> A 32GB microSDHC card reported bad blocks after 4 days, while a usb >> pen drive returns bogus data without any warning at all. >> >> So I wonder, how would btrfs behave in raid1 on two such devices? >> Would it simply mark bad blocks as "bad" and continue to be >> operational, or will it bail out when some block can not be >> read/written anymore on one of the two devices? > > If a block is read and fails its checksum, then the other copy (in > RAID-1) is checked and used if it's good. The bad copy is rewritten to > use the good data. > > If the block is bad such that writing to it won't fix it, then > there's probably two cases: the device returns an IO error, in which > case I suspect (but can't be sure) that the FS will go read-only. Or > the device silently fails the write and claims success, in which case > you're back to the situation above of the block failing its checksum.
In a normally operating drive, when the drive firmware locates a physical sector with persistent write failures, it's dereferenced. So the LBA points to a reserve physical sector, the originally can't be accessed by LBA. If all of the reserve sectors get used up, the next persistent write failure will result in a write error reported to libata and this will appear in dmesg, and should be treated as the drive being no longer in normal operation. It's a drive useful for storage developers, but not for production usage. > There's no marking of bad blocks right now, and I don't know of > anyone working on the feature, so the FS will probably keep going back > to the bad blocks as it makes CoW copies for modification. This is maybe relevant: https://www.kernel.org/doc/htmldocs/libata/ataExceptions.html "READ and WRITE commands report CHS or LBA of the first failed sector but ATA/ATAPI standard specifies that the amount of transferred data on error completion is indeterminate, so we cannot assume that sectors preceding the failed sector have been transferred and thus cannot complete those sectors successfully as SCSI does." If I understand that correctly, Btrfs really ought to either punt the device, or make the whole volume read-only. For production use, going read-only very well could mean data loss, even while preserving the state of the file system. Eventually I'd rather see the offending device ejected from the volume, and for the volume to remain rw,degraded. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html