Re: Scrubbing with BTRFS Raid 5
Graham Fleming posted on Sun, 19 Jan 2014 16:53:13 -0800 as excerpted: > From the wiki, I see that scrubbing is not supported on a RAID 5 volume. > > Can I still run the scrub routing (maybe read-only?) to check for any > issues. I understand at this point running 3.12 kernel there are no > routines to fix parity issues with RAID 5 while scrubbing but just want > to know if I'm either a) not causing any harm by running the scrub on a > RAID 5 volume and b) it's actually goin to provide me with useful > feedback (ie file X is damaged). This isn't a direct answer to your question, but answers a somewhat more basic question... Btrfs raid5/6 isn't ready for use in a live-environment yet, period, only for testing where the reliability of the data beyond the test doesn't matter. It works as long as everything works normally, writing out the parity blocks as well as the data, but besides scrub not yet being implemented, neither is recovery from loss of device, or from out-of-sync- state power-off. Since the whole /point/ of raid5/6 is recovery from device-loss, without that it's simply a less efficient raid0, which accepts the risk of fully data loss if a device is lost in ordered to gain the higher thruput of N- way data striping. So in practice at this point, if you're willing to accept loss of all data and want the higher thruput, you'd use raid0 or perhaps single mode instead, or if not, you'd use raid1 or raid10 mode. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Scrubbing with BTRFS Raid 5
Thanks for all the info guys. I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with them. I copied some data (from dev/urandom) into two test files and got their MD5 sums and saved them to a text file. I then unmounted the volume, trashed Disk3 and created a new Disk4 file, attached to /dev/loop4. I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I added /dev/loop4 to the volume and then deleted the missing device and it rebalanced. I had data spread out on all three devices now. MD5 sums unchanged on test files. This, to me, implies BTRFS RAID 5 is working quite well and I can in fact, replace a dead drive. Am I missing something?-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: > Thanks for all the info guys. > > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with > them. > > I copied some data (from dev/urandom) into two test files and got their > MD5 sums and saved them to a text file. > > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, > attached to /dev/loop4. > > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I > added /dev/loop4 to the volume and then deleted the missing device and > it rebalanced. I had data spread out on all three devices now. MD5 sums > unchanged on test files. > > This, to me, implies BTRFS RAID 5 is working quite well and I can in > fact, > replace a dead drive. > > Am I missing something? What you're missing is that device death and replacement rarely happens as neatly as your test (clean unmounts and all, no middle-of-process power-loss, etc). You tested best-case, not real-life or worst-case. Try that again, setting up the raid5, setting up a big write to it, disconnect one device in the middle of that write (I'm not sure if just dropping the loop works or if the kernel gracefully shuts down the loop device), then unplugging the system without unmounting... and /then/ see what sense btrfs can make of the resulting mess. In theory, with an atomic write btree filesystem such as btrfs, even that should work fine, minus perhaps the last few seconds of file-write activity, but the filesystem should remain consistent on degraded remount and device add, device remove, and rebalance, even if another power-pull happens in the middle of /that/. But given btrfs' raid5 incompleteness, I don't expect that will work. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
Would it be reasonably accurate to say "btrfs' RAID5 implementation is likely working well enough and safe enough if you are backing up regularly and are willing and able to restore from backup if necessary if a device failure goes horribly wrong", then? This is a reasonably serious question. My typical scenario runs along the lines of two identical machines with regular filesystem replication between them; in the event of something going horribly horribly wrong with the production machine, I just spin up services on the replicated machine - making it "production" - and then deal with the broken one at relative leisure. If the worst thing wrong with RAID5/6 in current btrfs is "might not deal as well as you'd like with a really nasty example of single-drive failure", that would likely be livable for me. On 01/21/2014 12:08 PM, Duncan wrote: > What you're missing is that device death and replacement rarely happens > as neatly as your test (clean unmounts and all, no middle-of-process > power-loss, etc). You tested best-case, not real-life or worst-case. > > Try that again, setting up the raid5, setting up a big write to it, > disconnect one device in the middle of that write (I'm not sure if just > dropping the loop works or if the kernel gracefully shuts down the loop > device), then unplugging the system without unmounting... and /then/ see > what sense btrfs can make of the resulting mess. In theory, with an > atomic write btree filesystem such as btrfs, even that should work fine, > minus perhaps the last few seconds of file-write activity, but the > filesystem should remain consistent on degraded remount and device add, > device remove, and rebalance, even if another power-pull happens in the > middle of /that/. > > But given btrfs' raid5 incompleteness, I don't expect that will work. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
On Jan 21, 2014, at 10:18 AM, Jim Salter wrote: > Would it be reasonably accurate to say "btrfs' RAID5 implementation is likely > working well enough and safe enough if you are backing up regularly and are > willing and able to restore from backup if necessary if a device failure goes > horribly wrong", then? It's for testing purposes. If you really want to commit a production machine for testing a file system, and you're prepared to lose 100% of changes since the last backup, OK do that. > If the worst thing wrong with RAID5/6 in current btrfs is "might not deal as > well as you'd like with a really nasty example of single-drive failure", that > would likely be livable for me. It was just one hypothetical scenario, it's not the only one. If it's really truly seriously being tested, eventually you'll break it. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Scrubbing with BTRFS Raid 5
Thanks again for the added info; very helpful. I want to keep playing around with BTRFSS RAID 5 and testing with it... assuming I have a drive with bad blocks, or let's say some inconsistent parity am I right in assuming that a) a btrfs scrub operation will not fix the stripes with bad parity and b) a balance operation will not be successful? Or would a balance operation work to re-write parity?-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
There are different values of "testing" and of "production" - in my world, at least, they're not atomically defined categories. =) On 01/21/2014 12:38 PM, Chris Murphy wrote: It's for testing purposes. If you really want to commit a production machine for testing a file system, and you're prepared to lose 100% of changes since the last backup, OK do that. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
Graham Fleming posted on Tue, 21 Jan 2014 10:03:26 -0800 as excerpted: > I want to keep playing around with BTRFSS RAID 5 and testing with it... > assuming I have a drive with bad blocks, or let's say some inconsistent > parity am I right in assuming that a) a btrfs scrub operation will not > fix the stripes with bad parity What I know is that it is said btrfs scrub doesn't work with btrfs raid5/6 yet. I don't know how it actually fails (tho I'd hope it simply returns an error to the effect that it doesn't work with raid5/6 yet) as I've not actually tried that mode, here. > and b) a balance operation will not be > successful? Or would a balance operation work to re-write parity? Balance actually rewrites everything (well, everything matching its filters if a filtered balance is used, everything, if not), so it should rewrite parity correctly. AFAIK, all the writing works and routine read works. It's the error recovery that's still only partially implemented. Since reading just reads data, not parity unless there's a dropped device or the like to recover from, as long as all devices are active and there's a good copy of the data (based on btrfs checksumming) to read, the rebalance should just use and rewrite that, ignoring the bad parity. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
Jim Salter posted on Tue, 21 Jan 2014 12:18:01 -0500 as excerpted: > Would it be reasonably accurate to say "btrfs' RAID5 implementation is > likely working well enough and safe enough if you are backing up > regularly and are willing and able to restore from backup if necessary > if a device failure goes horribly wrong", then? I'd say (and IIRC I did say somewhere, but don't remember if it was this thread) that in reliability terms btrfs raid5 should be treated like btrfs raid0 at this point. Raid0 is well known to have absolutely no failover -- if a device fails, the raid is toast. It's possible so- called "extreme measures" may recover data from the surviving bits (think the $expen$ive$ $ervice$ of data recovery firms), but the idea is that either no data that's not easily replaced is stored on a raid0 in the first place, or if it is, there's (tested recoverable) backup to the level that you're fully comfortable with losing EVERYTHING not backed up. Examples of good data for raid0 are the kernel sources (as a user, not a dev, so you're not hacking on them), your distro's local package cache, browser cache, etc. This because by definition all those examples have the net as their backup, so loss of a local copy means a bit more to download, at worst. That's what btrfs raid5/6 are at the moment, effectively raid0 from a recovery perspective. Now the parity /is/ being written; it simply can't be treated as available for recovery. So supposing you do /not/ lose a device (or suffer a bad checksum) on the raid5 until after the recovery code is complete and available, you've effectively "free" upgraded from raid0 reliability to raid5 reliability as soon as recovery is possible, which will be nice, and meanwhile you can test the operational functionality, so there /are/ reasons you might want to run the btrfs raid5 mode now. As long as you remember it's currently effectively raid0 should something go wrong, and you either don't use it for valuable data in the first place, or you're willing to do without any updates to that data since the last tested backup, should it come to that. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
On Tue, 2014-01-21 at 17:08 +, Duncan wrote: > Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: > > > Thanks for all the info guys. > > > > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and > > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with > > them. > > > > I copied some data (from dev/urandom) into two test files and got their > > MD5 sums and saved them to a text file. > > > > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, > > attached to /dev/loop4. > > > > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I > > added /dev/loop4 to the volume and then deleted the missing device and > > it rebalanced. I had data spread out on all three devices now. MD5 sums > > unchanged on test files. > > > > This, to me, implies BTRFS RAID 5 is working quite well and I can in > > fact, > > replace a dead drive. > > > > Am I missing something? > > What you're missing is that device death and replacement rarely happens > as neatly as your test (clean unmounts and all, no middle-of-process > power-loss, etc). You tested best-case, not real-life or worst-case. > > Try that again, setting up the raid5, setting up a big write to it, > disconnect one device in the middle of that write (I'm not sure if just > dropping the loop works or if the kernel gracefully shuts down the loop > device), then unplugging the system without unmounting... and /then/ see > what sense btrfs can make of the resulting mess. In theory, with an > atomic write btree filesystem such as btrfs, even that should work fine, > minus perhaps the last few seconds of file-write activity, but the > filesystem should remain consistent on degraded remount and device add, > device remove, and rebalance, even if another power-pull happens in the > middle of /that/. > > But given btrfs' raid5 incompleteness, I don't expect that will work. > raid5/6 deals with IO errors from one or two drives, and it is able to reconstruct the parity from the remaining drives and give you good data. If we hit a crc error, the raid5/6 code will try a parity reconstruction to make good data, and if we find good data from the other copy, it'll return that up to userland. In other words, for those cases it works just like raid1/10. What it won't do (yet) is write that good data back to the storage. It'll stay bad until you remove the device or run balance to rewrite everything. Balance will reconstruct parity to get good data as it balances. This isn't as useful as scrub, but that work is coming. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason wrote: > On Tue, 2014-01-21 at 17:08 +, Duncan wrote: >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: >> >> > Thanks for all the info guys. >> > >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with >> > them. >> > >> > I copied some data (from dev/urandom) into two test files and got their >> > MD5 sums and saved them to a text file. >> > >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, >> > attached to /dev/loop4. >> > >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I >> > added /dev/loop4 to the volume and then deleted the missing device and >> > it rebalanced. I had data spread out on all three devices now. MD5 sums >> > unchanged on test files. >> > >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in >> > fact, >> > replace a dead drive. >> > >> > Am I missing something? >> >> What you're missing is that device death and replacement rarely happens >> as neatly as your test (clean unmounts and all, no middle-of-process >> power-loss, etc). You tested best-case, not real-life or worst-case. >> >> Try that again, setting up the raid5, setting up a big write to it, >> disconnect one device in the middle of that write (I'm not sure if just >> dropping the loop works or if the kernel gracefully shuts down the loop >> device), then unplugging the system without unmounting... and /then/ see >> what sense btrfs can make of the resulting mess. In theory, with an >> atomic write btree filesystem such as btrfs, even that should work fine, >> minus perhaps the last few seconds of file-write activity, but the >> filesystem should remain consistent on degraded remount and device add, >> device remove, and rebalance, even if another power-pull happens in the >> middle of /that/. >> >> But given btrfs' raid5 incompleteness, I don't expect that will work. >> > > raid5/6 deals with IO errors from one or two drives, and it is able to > reconstruct the parity from the remaining drives and give you good data. > > If we hit a crc error, the raid5/6 code will try a parity reconstruction > to make good data, and if we find good data from the other copy, it'll > return that up to userland. > > In other words, for those cases it works just like raid1/10. What it > won't do (yet) is write that good data back to the storage. It'll stay > bad until you remove the device or run balance to rewrite everything. > > Balance will reconstruct parity to get good data as it balances. This > isn't as useful as scrub, but that work is coming. > That is awesome! What about online conversion from not-raid5/6 to raid5/6 what is the status for that code, for example what happens if there is a failure during the conversion or a reboot ? > -chris > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote: > On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason wrote: > > On Tue, 2014-01-21 at 17:08 +, Duncan wrote: > >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: > >> > >> > Thanks for all the info guys. > >> > > >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and > >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with > >> > them. > >> > > >> > I copied some data (from dev/urandom) into two test files and got their > >> > MD5 sums and saved them to a text file. > >> > > >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, > >> > attached to /dev/loop4. > >> > > >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I > >> > added /dev/loop4 to the volume and then deleted the missing device and > >> > it rebalanced. I had data spread out on all three devices now. MD5 sums > >> > unchanged on test files. > >> > > >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in > >> > fact, > >> > replace a dead drive. > >> > > >> > Am I missing something? > >> > >> What you're missing is that device death and replacement rarely happens > >> as neatly as your test (clean unmounts and all, no middle-of-process > >> power-loss, etc). You tested best-case, not real-life or worst-case. > >> > >> Try that again, setting up the raid5, setting up a big write to it, > >> disconnect one device in the middle of that write (I'm not sure if just > >> dropping the loop works or if the kernel gracefully shuts down the loop > >> device), then unplugging the system without unmounting... and /then/ see > >> what sense btrfs can make of the resulting mess. In theory, with an > >> atomic write btree filesystem such as btrfs, even that should work fine, > >> minus perhaps the last few seconds of file-write activity, but the > >> filesystem should remain consistent on degraded remount and device add, > >> device remove, and rebalance, even if another power-pull happens in the > >> middle of /that/. > >> > >> But given btrfs' raid5 incompleteness, I don't expect that will work. > >> > > > > raid5/6 deals with IO errors from one or two drives, and it is able to > > reconstruct the parity from the remaining drives and give you good data. > > > > If we hit a crc error, the raid5/6 code will try a parity reconstruction > > to make good data, and if we find good data from the other copy, it'll > > return that up to userland. > > > > In other words, for those cases it works just like raid1/10. What it > > won't do (yet) is write that good data back to the storage. It'll stay > > bad until you remove the device or run balance to rewrite everything. > > > > Balance will reconstruct parity to get good data as it balances. This > > isn't as useful as scrub, but that work is coming. > > > > That is awesome! > > What about online conversion from not-raid5/6 to raid5/6 what is the > status for that code, for example > what happens if there is a failure during the conversion or a reboot ? The conversion code uses balance, so that works normally. If there is a failure during the conversion you'll end up with some things raid5/6 and somethings at whatever other level you used. The data will still be there, but you are more prone to enospc problems ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrubbing with BTRFS Raid 5
On Wed, Jan 22, 2014 at 1:16 PM, Chris Mason wrote: > On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote: >> On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason wrote: >> > On Tue, 2014-01-21 at 17:08 +, Duncan wrote: >> >> Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted: >> >> >> >> > Thanks for all the info guys. >> >> > >> >> > I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and >> >> > attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with >> >> > them. >> >> > >> >> > I copied some data (from dev/urandom) into two test files and got their >> >> > MD5 sums and saved them to a text file. >> >> > >> >> > I then unmounted the volume, trashed Disk3 and created a new Disk4 file, >> >> > attached to /dev/loop4. >> >> > >> >> > I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I >> >> > added /dev/loop4 to the volume and then deleted the missing device and >> >> > it rebalanced. I had data spread out on all three devices now. MD5 sums >> >> > unchanged on test files. >> >> > >> >> > This, to me, implies BTRFS RAID 5 is working quite well and I can in >> >> > fact, >> >> > replace a dead drive. >> >> > >> >> > Am I missing something? >> >> >> >> What you're missing is that device death and replacement rarely happens >> >> as neatly as your test (clean unmounts and all, no middle-of-process >> >> power-loss, etc). You tested best-case, not real-life or worst-case. >> >> >> >> Try that again, setting up the raid5, setting up a big write to it, >> >> disconnect one device in the middle of that write (I'm not sure if just >> >> dropping the loop works or if the kernel gracefully shuts down the loop >> >> device), then unplugging the system without unmounting... and /then/ see >> >> what sense btrfs can make of the resulting mess. In theory, with an >> >> atomic write btree filesystem such as btrfs, even that should work fine, >> >> minus perhaps the last few seconds of file-write activity, but the >> >> filesystem should remain consistent on degraded remount and device add, >> >> device remove, and rebalance, even if another power-pull happens in the >> >> middle of /that/. >> >> >> >> But given btrfs' raid5 incompleteness, I don't expect that will work. >> >> >> > >> > raid5/6 deals with IO errors from one or two drives, and it is able to >> > reconstruct the parity from the remaining drives and give you good data. >> > >> > If we hit a crc error, the raid5/6 code will try a parity reconstruction >> > to make good data, and if we find good data from the other copy, it'll >> > return that up to userland. >> > >> > In other words, for those cases it works just like raid1/10. What it >> > won't do (yet) is write that good data back to the storage. It'll stay >> > bad until you remove the device or run balance to rewrite everything. >> > >> > Balance will reconstruct parity to get good data as it balances. This >> > isn't as useful as scrub, but that work is coming. >> > >> >> That is awesome! >> >> What about online conversion from not-raid5/6 to raid5/6 what is the >> status for that code, for example >> what happens if there is a failure during the conversion or a reboot ? > > The conversion code uses balance, so that works normally. If there is a > failure during the conversion you'll end up with some things raid5/6 and > somethings at whatever other level you used. > > The data will still be there, but you are more prone to enospc > problems ;) > Ok, but if there is enough space, you could just restart the balance and it will eventually finish and all should, with some luck, be ok? Awesome. This sounds like things are a lot closer to raid5/6 being fully operational than I realized. > -chris > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html