On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote: > On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason <c...@fb.com> wrote: > > On Tue, 2014-01-21 at 17:08 +0000, Duncan wrote: > >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: > >> > >> > Thanks for all the info guys. > >> > > >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and > >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with > >> > them. > >> > > >> > I copied some data (from dev/urandom) into two test files and got their > >> > MD5 sums and saved them to a text file. > >> > > >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, > >> > attached to /dev/loop4. > >> > > >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I > >> > added /dev/loop4 to the volume and then deleted the missing device and > >> > it rebalanced. I had data spread out on all three devices now. MD5 sums > >> > unchanged on test files. > >> > > >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in > >> > fact, > >> > replace a dead drive. > >> > > >> > Am I missing something? > >> > >> What you're missing is that device death and replacement rarely happens > >> as neatly as your test (clean unmounts and all, no middle-of-process > >> power-loss, etc). You tested best-case, not real-life or worst-case. > >> > >> Try that again, setting up the raid5, setting up a big write to it, > >> disconnect one device in the middle of that write (I'm not sure if just > >> dropping the loop works or if the kernel gracefully shuts down the loop > >> device), then unplugging the system without unmounting... and /then/ see > >> what sense btrfs can make of the resulting mess. In theory, with an > >> atomic write btree filesystem such as btrfs, even that should work fine, > >> minus perhaps the last few seconds of file-write activity, but the > >> filesystem should remain consistent on degraded remount and device add, > >> device remove, and rebalance, even if another power-pull happens in the > >> middle of /that/. > >> > >> But given btrfs' raid5 incompleteness, I don't expect that will work. > >> > > > > raid5/6 deals with IO errors from one or two drives, and it is able to > > reconstruct the parity from the remaining drives and give you good data. > > > > If we hit a crc error, the raid5/6 code will try a parity reconstruction > > to make good data, and if we find good data from the other copy, it'll > > return that up to userland. > > > > In other words, for those cases it works just like raid1/10. What it > > won't do (yet) is write that good data back to the storage. It'll stay > > bad until you remove the device or run balance to rewrite everything. > > > > Balance will reconstruct parity to get good data as it balances. This > > isn't as useful as scrub, but that work is coming. > > > > That is awesome! > > What about online conversion from not-raid5/6 to raid5/6 what is the > status for that code, for example > what happens if there is a failure during the conversion or a reboot ?
The conversion code uses balance, so that works normally. If there is a failure during the conversion you'll end up with some things raid5/6 and somethings at whatever other level you used. The data will still be there, but you are more prone to enospc problems ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html