On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote: > Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: > > > Thanks for all the info guys. > > > > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and > > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with > > them. > > > > I copied some data (from dev/urandom) into two test files and got their > > MD5 sums and saved them to a text file. > > > > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, > > attached to /dev/loop4. > > > > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I > > added /dev/loop4 to the volume and then deleted the missing device and > > it rebalanced. I had data spread out on all three devices now. MD5 sums > > unchanged on test files. > > > > This, to me, implies BTRFS RAID 5 is working quite well and I can in > > fact, > > replace a dead drive. > > > > Am I missing something? > > What you're missing is that device death and replacement rarely happens > as neatly as your test (clean unmounts and all, no middle-of-process > power-loss, etc). You tested best-case, not real-life or worst-case. > > Try that again, setting up the raid5, setting up a big write to it, > disconnect one device in the middle of that write (I'm not sure if just > dropping the loop works or if the kernel gracefully shuts down the loop > device), then unplugging the system without unmounting... and /then/ see > what sense btrfs can make of the resulting mess. In theory, with an > atomic write btree filesystem such as btrfs, even that should work fine, > minus perhaps the last few seconds of file-write activity, but the > filesystem should remain consistent on degraded remount and device add, > device remove, and rebalance, even if another power-pull happens in the > middle of /that/. > > But given btrfs' raid5 incompleteness, I don't expect that will work. >
raid5/6 deals with IO errors from one or two drives, and it is able to reconstruct the parity from the remaining drives and give you good data. If we hit a crc error, the raid5/6 code will try a parity reconstruction to make good data, and if we find good data from the other copy, it'll return that up to userland. In other words, for those cases it works just like raid1/10. What it won't do (yet) is write that good data back to the storage. It'll stay bad until you remove the device or run balance to rewrite everything. Balance will reconstruct parity to get good data as it balances. This isn't as useful as scrub, but that work is coming. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html