Re: I need to P. are we almost there yet?
Bob Marley posted on Sat, 03 Jan 2015 12:34:41 +0100 as excerpted: On 29/12/2014 19:56, sys.syphus wrote: specifically (P)arity. very specifically n+2. when will raid5 raid6 be at least as safe to run as raid1 currently is? I don't like the idea of being 2 bad drives away from total catastrophe. (and yes i backup, it just wouldn't be fun to go down that route.) What about using btrfs on top of MD raid? The problem with that is data integrity. mdraid doesn't have it. btrfs does. If you present a single mdraid device to btrfs and run single mode on it, and one copy on the mdraid is corrupt, mdraid may well simply present it as it does no integrity checking. btrfs will catch and reject that, but because it sees a single device, it'll think the entire thing is corrupt. If you present multiple devices to btrfs and run btrfs raid1 mode, it'll have a second copy to check, but if a bad copy exists on each side and that's the copy mdraid hands btrfs, again, btrfs will reject it, having no idea there's actually a good copy on the mdraid underneath; the mdraid simply didn't happen to pick that copy to present. And mdraid-5/6 doesn't make things any better, because unless there's a problem, mdraid will simply read and present the data, ignoring the parity with which it could probably correct the bad data (at least with raid6). The only way to get truly verified data with triple-redundancy or 2X parity or better is when btrfs handles it, as it keeps and actually checks checksums to verify. But btrfs raid56 mode should be complete with kernel 3.19 and presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure. N-way-mirroring (my particular hotly awaited feature) is next up, but given the time raid56 took, I don't think anybody's predicting when it'll be actually in-tree and ready for use. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On 03/01/2015 14:11, Duncan wrote: Bob Marley posted on Sat, 03 Jan 2015 12:34:41 +0100 as excerpted: On 29/12/2014 19:56, sys.syphus wrote: specifically (P)arity. very specifically n+2. when will raid5 raid6 be at least as safe to run as raid1 currently is? I don't like the idea of being 2 bad drives away from total catastrophe. (and yes i backup, it just wouldn't be fun to go down that route.) What about using btrfs on top of MD raid? The problem with that is data integrity. mdraid doesn't have it. btrfs does. If you present a single mdraid device to btrfs and run single mode on it, and one copy on the mdraid is corrupt, mdraid may well simply present it as it does no integrity checking. btrfs will catch and reject that, but because it sees a single device, it'll think the entire thing is corrupt. Which is really not bad, considering the chance that something gets corrupt. Already it is an exceedingly rare event. Detection without correction can be more than enough. Since always things have worked in the computer science field without even the detection feature. Most likely even your bank account and mine are held in databases which are located in filesystems or blockdevices which do not even have the corruption detection feature. And, last but not least, as of now a btrfs bug is more likely than hard disks' silent data corruption. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
But btrfs raid56 mode should be complete with kernel 3.19 and presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure. N-way-mirroring (my particular hotly awaited feature) is next up, but given the time raid56 took, I don't think anybody's predicting when it'll be actually in-tree and ready for use. is that the feature where you say i want x copies of this file and y copies of this other file? e.g. raid at the file level, with the ability to adjust redundancy by file? I wonder if there is any sort of bandaid you can put on top of btrfs to give some of this redundancy. things exist like git annex, but i don't love it's bugs and oddball selection of programming language. Do you guys use any other open source tools on top of btrfs to help manage your data? (i.e. git annex; camlistore) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On 29/12/2014 19:56, sys.syphus wrote: specifically (P)arity. very specifically n+2. when will raid5 raid6 be at least as safe to run as raid1 currently is? I don't like the idea of being 2 bad drives away from total catastrophe. (and yes i backup, it just wouldn't be fun to go down that route.) What about using btrfs on top of MD raid? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
Which is really not bad, considering the chance that something gets corrupt. Already it is an exceedingly rare event. Detection without correction can be more than enough. Since always things have worked in the computer science field without even the detection feature. Most likely even your bank account and mine are held in databases which are located in filesystems or blockdevices which do not even have the corruption detection feature. And, last but not least, as of now a btrfs bug is more likely than hard disks' silent data corruption. I think thats dangerous thinking and what has gotten us here. The whole point of zfs / btrfs is that due to the current size of storage, what was previously unlikely is now a statistical certitude. In short, Murphy's law. We are now using green drives and s3 fuse and shitty flash media, the era of trusting the block device is over. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Sat, 3 Jan 2015 13:11:57 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: What about using btrfs on top of MD raid? The problem with that is data integrity. mdraid doesn't have it. btrfs does. Most importantly however, you aren't any worse off with Btrfs on top of MD, than with Btrfs on a single device, or with Ext4/XFS/JFS/etc on top of MD. Sure you don't get checksum-based recovery from partial corruption of a RAID, but you do get other features of Btrfs, such as robust snapshot support, ability to online-resize up and down, compression, and actually, checksum verification: even if it won't be able to recover from a corruption, at least it will warn you of it (and you could recover from backups), while other FSes will pass through the corrupted data silently. So until Btrfs multi-device support is feature-complete (and yes that includes performance-wise), running Btrfs in single-device mode on top of MD RAID is arguably the most optimal way to use Btrfs in a RAID setup. (Personally I am running Btrfs on top of 7x2TB MD RAID6, 3x2TB MD RAID5 and 2x2TB MD RAID1). -- With respect, Roman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
sys.syphus posted on Sat, 03 Jan 2015 12:55:27 -0600 as excerpted: But btrfs raid56 mode should be complete with kernel 3.19 and presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure. N-way-mirroring (my particular hotly awaited feature) is next up, but given the time raid56 took, I don't think anybody's predicting when it'll be actually in-tree and ready for use. is that the feature where you say i want x copies of this file and y copies of this other file? e.g. raid at the file level, with the ability to adjust redundancy by file? Per-file isn't available yet, tho at least per-subvolume is roadmapped, and now that we have the properties framework working via xattr for files as well, at least in theory, there is AFAIK no reason to limit it to per- subvolume, as per-file should be about as easy once the code that currently limits it to per-filesystem is rewritten. But actually fully working per-filesystem raid56 is enough for a lot of people, and actually working per-filesystem N-way-mirroring is what I'm after, since I already setup multiple filesystems in ordered to keep my data eggs from all being in the same filesystem basket. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
Roman Mamedov posted on Sun, 04 Jan 2015 02:58:35 +0500 as excerpted: On Sat, 3 Jan 2015 13:11:57 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: What about using btrfs on top of MD raid? The problem with that is data integrity. mdraid doesn't have it. btrfs does. Most importantly however, you aren't any worse off with Btrfs on top of MD, than with Btrfs on a single device, or with Ext4/XFS/JFS/etc on top of MD. Good point! =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Sun, Jan 04, 2015 at 03:22:53AM +, Duncan wrote: sys.syphus posted on Sat, 03 Jan 2015 12:55:27 -0600 as excerpted: But btrfs raid56 mode should be complete with kernel 3.19 and presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure. N-way-mirroring (my particular hotly awaited feature) is next up, but given the time raid56 took, I don't think anybody's predicting when it'll be actually in-tree and ready for use. is that the feature where you say i want x copies of this file and y copies of this other file? e.g. raid at the file level, with the ability to adjust redundancy by file? Per-file isn't available yet, tho at least per-subvolume is roadmapped, and now that we have the properties framework working via xattr for files as well, at least in theory, there is AFAIK no reason to limit it to per- subvolume, as per-file should be about as easy once the code that currently limits it to per-filesystem is rewritten. roadmapped -- fond wish. Also, per-file is a bit bloody awkward to get working. Having sat and thought about it hard for a while, I'm not convinced that it would actually be worth the implementation effort. Certainly, nobody should be thinking about having (say) a different RAID config for every file -- that way lies madness. I would expect, at most, small integers (=3) of different profiles for data in any given filesystem, with the majority of data being of one particular profile. Anything trying to get more spohisticated than that is likely asking for intractable space-allocation problems. Think, requiring regular full-balance operations. The behaviour of the chunk allocator in the presence of merely two allocation profiles (data/metadata) is awkward enough. Introducing more of them is something that will require a separate research programme to understand fully. I will probably have an opportunity to discuss the basics of multiple allocations schemes with someone more qualified than I am on Tuesday, but I doubt that we'll reach any firm conclusion for many months at best (if ever). The formal maths involved gets quite nasty, quite quickly. Hugo. But actually fully working per-filesystem raid56 is enough for a lot of people, and actually working per-filesystem N-way-mirroring is what I'm after, since I already setup multiple filesystems in ordered to keep my data eggs from all being in the same filesystem basket. -- Hugo Mills | If it's December 1941 in Casablanca, what time is it hugo@... carfax.org.uk | in New York? http://carfax.org.uk/ | PGP: 65E74AC0 | Rick Blaine, Casablanca signature.asc Description: Digital signature
Re: I need to P. are we almost there yet?
On 2014-12-31 12:27, ashf...@whisperpc.com wrote: Phillip I had a similar question a year or two ago ( specifically about raid10 ) so I both experimented and read the code myself to find out. I was disappointed to find that it won't do raid10 on 3 disks since the chunk metadata describes raid10 as a stripe layered on top of a mirror. Jose's point was also a good one though; one chunk may decide to mirror disks A and B, so a failure of A and C it could recover from, but a different chunk could choose to mirror on disks A and C, so that chunk would be lost if A and C fail. It would probably be nice if the chunk allocator tried to be more deterministic about that. I see this as a CRITICAL design flaw. The reason for calling it CRITICAL is that System Administrators have been trained for 20 years that RAID-10 can usually handle a dual-disk failure, but the BTRFS implementation has effectively ZERO chance of doing so. No, some rather simple math will tell you that a 4 disk BTRFS filesystem in raid10 mode has exactly a 50% chance of surviving a dual disk failure, and that as the number of disks goes up, the chance of survival will asymptotically approach 100% (but never reach it). This is the case for _every_ RAID-10 implementation that I have ever seen, including hardware raid controllers; the only real difference is in the stripe length (usually 512 bytes * half the number of disks for hardware raid, 4k * half the number of disks for software raid, and the filesystem block size (default is 16k in current versions) * half the number of disks for BTRFS). smime.p7s Description: S/MIME Cryptographic Signature
Re: I need to P. are we almost there yet?
On 2015-01-02 12:45, Brendan Hide wrote: On 2015/01/02 15:42, Austin S Hemmelgarn wrote: On 2014-12-31 12:27, ashf...@whisperpc.com wrote: I see this as a CRITICAL design flaw. The reason for calling it CRITICAL is that System Administrators have been trained for 20 years that RAID-10 can usually handle a dual-disk failure, but the BTRFS implementation has effectively ZERO chance of doing so. No, some rather simple math That's the problem. The math isn't as simple as you'd expect: The example below is probably a pathological case - but here goes. Let's say in this 4-disk example that chunks are striped as d1,d2,d1,d2 where d1 is the first bit of data and d2 is the second: Chunk 1 might be striped across disks A,B,C,D d1,d2,d1,d2 Chunk 2 might be striped across disks B,C,A,D d3,d4,d3,d4 Chunk 3 might be striped across disks D,A,C,B d5,d6,d5,d6 Chunk 4 might be striped across disks A,C,B,D d7,d8,d7,d8 Chunk 5 might be striped across disks A,C,D,B d9,d10,d9,d10 Lose any two disks and you have a 50% chance on *each* chunk to have lost that chunk. With traditional RAID10 you have a 50% chance of losing the array entirely. With btrfs, the more data you have stored, the chances get closer to 100% of losing *some* data in a 2-disk failure. In the above example, losing A and B means you lose d3, d6, and d7 (which ends up being 60% of all chunks). Losing A and C means you lose d1 (20% of all chunks).OK Losing A and D means you lose d9 (20% of all chunks). Losing B and C means you lose d10 (20% of all chunks). Losing B and D means you lose d2 (20% of all chunks). Losing C and D means you lose d4,d5, AND d8 (60% of all chunks) The above skewed example has an average of 40% of all chunks failed. As you add more data and randomise the allocation, this will approach 50% - BUT, the chances of losing *some* data is already clearly shown to be very close to 100%. OK, I forgot about the randomization effect that the chunk allocation and freeing has. We really should slap a *BIG* warning label on that (and ideally find some better way to do it so it's more reliable). As an aside, I've found that a BTRFS raid1 set on top of 2 LVM/MD RAID0 sets is actually faster than using a BTRFS raid10 set with the same number of disks (how much faster is workload dependent), and provides better guarantees than a BTRFS raid10 set. smime.p7s Description: S/MIME Cryptographic Signature
Re: I need to P. are we almost there yet?
On 2015/01/02 15:42, Austin S Hemmelgarn wrote: On 2014-12-31 12:27, ashf...@whisperpc.com wrote: I see this as a CRITICAL design flaw. The reason for calling it CRITICAL is that System Administrators have been trained for 20 years that RAID-10 can usually handle a dual-disk failure, but the BTRFS implementation has effectively ZERO chance of doing so. No, some rather simple math That's the problem. The math isn't as simple as you'd expect: The example below is probably a pathological case - but here goes. Let's say in this 4-disk example that chunks are striped as d1,d2,d1,d2 where d1 is the first bit of data and d2 is the second: Chunk 1 might be striped across disks A,B,C,D d1,d2,d1,d2 Chunk 2 might be striped across disks B,C,A,D d3,d4,d3,d4 Chunk 3 might be striped across disks D,A,C,B d5,d6,d5,d6 Chunk 4 might be striped across disks A,C,B,D d7,d8,d7,d8 Chunk 5 might be striped across disks A,C,D,B d9,d10,d9,d10 Lose any two disks and you have a 50% chance on *each* chunk to have lost that chunk. With traditional RAID10 you have a 50% chance of losing the array entirely. With btrfs, the more data you have stored, the chances get closer to 100% of losing *some* data in a 2-disk failure. In the above example, losing A and B means you lose d3, d6, and d7 (which ends up being 60% of all chunks). Losing A and C means you lose d1 (20% of all chunks). Losing A and D means you lose d9 (20% of all chunks). Losing B and C means you lose d10 (20% of all chunks). Losing B and D means you lose d2 (20% of all chunks). Losing C and D means you lose d4,d5, AND d8 (60% of all chunks) The above skewed example has an average of 40% of all chunks failed. As you add more data and randomise the allocation, this will approach 50% - BUT, the chances of losing *some* data is already clearly shown to be very close to 100%. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
Roger Binns posted on Thu, 01 Jan 2015 12:12:31 -0800 as excerpted: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/31/2014 05:26 PM, Chris Samuel wrote: I suspect this is a knock-on effect of the fact that (unless this has changed recently IIRC) RAID-1 with btrfs will only mirrors data over two drives, no matter how many you add to an array. It hasn't changed yet, but now that raid56 support is basically complete (with 3.19, other than bugs of course, it'll be another kernel cycle or two before I'd rely on it), that's next up on the raid-features roadmap. =:^) I know as that's my most hotly anticipated roadmapped btrfs feature yet to hit, and I've been waiting for it, patiently only because I didn't have much choice, for a couple years now. I wish btrfs wouldn't use the old school micro-managing storage terminology (or only as aliases) and instead let you set the goals. What people really mean is that they want their data to survive the failure of N drives - exactly how that is done doesn't matter. It would also be nice to be settable as an xattr on files and directories. Actually, a more flexible terminology has been discussed, and /might/ actually be introduced either along with or prior to the multi-way- mirroring feature (depending on how long the latter takes to develop, I'd guess). The suggested terminology would basically treat number of data strips, mirrors, parity, hot-spares, etc, each on its own separate axis, with parity levels ultimately extended well beyond 2 (aka raid6) as well -- I think to something like a dozen or 16. Obviously if it's introduced before N-way-mirroring, N-way-parity, etc, it would only support the current feature set for now, and would just be a different way of configuring mkfs as well as displaying the current layouts in btrfs filesystem df and usage. Hugo's the guy who has proposed that, and has been doing the preliminary patch development. Meanwhile, ultimately the ability to configure all this at least by subvolume is planned, and once it's actually possible to set it on less than a full filesystem basis, setting it by individual xattr has been discussed as well. I think the latter depends on the sorts of issues they run into in actual implementation. Finally, btrfs is already taking the xattr/property route with this sort of attribute. The basic infrastructure for that went in a couple kernel cycles ago, and can be seen and worked with using the btrfs property command. So the basic property/xattr infrastructure is already there, and the ability to configure redundancy per subvolume already built into the original btrfs design and roadmapped altho it's not yet implemented, which means it's actually quite likely to eventually be configurable by file via xattr/properties as well -- emphasis on /eventually/, as these features /do/ tend to take rather longer to actually develop and stabilize than originally predicted. The raid56 code is a good example, as it was originally slated for kernel cycle 3.6 or so, IIRC, but it took it over two years to cook and we're finally getting it in 3.19! -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
Phillip I had a similar question a year or two ago ( specifically about raid10 ) so I both experimented and read the code myself to find out. I was disappointed to find that it won't do raid10 on 3 disks since the chunk metadata describes raid10 as a stripe layered on top of a mirror. Jose's point was also a good one though; one chunk may decide to mirror disks A and B, so a failure of A and C it could recover from, but a different chunk could choose to mirror on disks A and C, so that chunk would be lost if A and C fail. It would probably be nice if the chunk allocator tried to be more deterministic about that. I see this as a CRITICAL design flaw. The reason for calling it CRITICAL is that System Administrators have been trained for 20 years that RAID-10 can usually handle a dual-disk failure, but the BTRFS implementation has effectively ZERO chance of doing so. According to every description of RAID-10 I've ever seen (including documentation from MaxStrat), RAID-10 stripes mirrored pairs/sets of disks. The device-level description is a critical component of what makes an array RAID-10, and is the reason for many of the attributes of RAID-10. This is NOT what BTRFS has implemented. While BTRFS may be distributing the chunks according to a RAID-10 methodology, that is NOT what the industry considers to be RAID-10. While the current methodology has the data replication of RAID-10, and it may have the performance of RAID-10, it absolutely DOES NOT have the robustness or uptime benefits that are expected of RAID-10. In order to remove this potentially catestrophic confusion, BTRFS should either call their RAID-10 implementation something else, or they should adhere to the long-established definition of RAID-10. Peter Ashford -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Wed, 31 Dec 2014 09:27:14 AM ashf...@whisperpc.com wrote: I see this as a CRITICAL design flaw. The reason for calling it CRITICAL is that System Administrators have been trained for 20 years that RAID-10 can usually handle a dual-disk failure, but the BTRFS implementation has effectively ZERO chance of doing so. I suspect this is a knock-on effect of the fact that (unless this has changed recently IIRC) RAID-1 with btrfs will only mirrors data over two drives, no matter how many you add to an array. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/29/2014 7:20 PM, ashf...@whisperpc.com wrote: Just some background data on traditional RAID, and the chances of survival with a 2-drive failure. In traditional RAID-10, the chances of surviving a 2-drive failure is 66% on a 4-drive array, and approaches 100% as the number of drives in the array increase. In traditional RAID-0+1 (used to be common in low-end fake-RAID cards), the chances of surviving a 2-drive failure is 33% on a 4-drive array, and approaches 50% as the number of drives in the array increase. In terms of data layout, there is really no difference between raid-10 ( or raid1+0 ) and raid0+1, aside from the designation you assign to each drive. With a dumb implementation of 0+1, any single drive failure offlines the entire stripe, discarding the remaining good disks in it, thus giving the probability you describe as the only possible remaining failure(s) that do not result in the mirror also failing is a drive in the same stripe as the original. This however, is only a deficiency of the implementation, not the data layout, as all of the data on the first failed drive could be recovered from a drive in the second stripe, so long as the second drive that failed was any drive other than the one holding the duplicate data of the first. This is partly why I agree with linux mdadm that raid10 is *not* simply raid1+0; the latter is just a naive, degenerate implementation of the former. In traditional RAID-1E, the chances of surviving a 2-drive failure is 66% on a 4-drive array, and approaches 100% as the number of drives in the array increase. This is the same as for RAID-10. RAID-1E allows an odd number of disks to be actively used in the array. What some vendors have called 1E is simply raid10 in the default near layout to mdadm. I prefer the higher performance offset layout myself. I'm wondering which of the above the BTRFS implementation most closely resembles. Unfortunately, btrfs just uses the naive raid1+0, so no 2 or 3 disk raid10 arrays, and no higher performing offset layout. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUoxyuAAoJENRVrw2cjl5R72oH/1nypXV72Bk4PBeaGAwH7559 lL6JH80216lbhv8hHopIeXKe7uqPGFAE5F1ArChIi08HA+CqKr5cfPNzJPlobyFj KNLzeXi+wnJO2mbvWnnJak83GVmvpBnYvS+22RCweDELCb3pulybleJnN4yVSL25 WpVfUGnAg5lQJdX2l6THeClWX6V47NKqD6iXbt9+jyADCK2yk/5+TVbS8tixFUtj PBxe+XGNrkTREnPAAFy6BgwO2vCD92F6+mm/lHJ0fg7gOm41UE09gzabsCGQ9LFA kk99c9WAnJdkTqUJVw49MEwmmhs/2gluKWTeaHONpBePoFIpQEjHI89TqBsKhY4= =+oed -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
Phillip Susi wrote: I'm wondering which of the above the BTRFS implementation most closely resembles. Unfortunately, btrfs just uses the naive raid1+0, so no 2 or 3 disk raid10 arrays, and no higher performing offset layout. Jose Manuel Perez Bethencourt wrote: I think you are missing crucial info on the layout on disk that BTRFS implements. While a traditional RAID1 has a rigid layout that has fixed and easily predictable locations for all data (exactly on two specific disks), BTRFS allocs chunks as needed on ANY two disks. Please research into this to understand the problem fully, this is the key to your question. There is a HUGE difference here. In the first case, the data will have a 50% chance of surviving a 2-drive failure. In the second case, the data will have an effectively 0% chance of surviving a 2-drive failure. I don't believe I need to mention which of the above is more reliable, or which I would prefer. I believe that someone who understands the code in depth (and that may also be one of the people above) determine exactly how BTRFS implements RAID-10. Thank you. Peter Ashford -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 12/30/2014 06:17 PM, ashf...@whisperpc.com wrote: I believe that someone who understands the code in depth (and that may also be one of the people above) determine exactly how BTRFS implements RAID-10. I am such a person. I had a similar question a year or two ago ( specifically about raid10 ) so I both experimented and read the code myself to find out. I was disappointed to find that it won't do raid10 on 3 disks since the chunk metadata describes raid10 as a stripe layered on top of a mirror. Jose's point was also a good one though; one chunk may decide to mirror disks A and B, so a failure of A and C it could recover from, but a different chunk could choose to mirror on disks A and C, so that chunk would be lost if A and C fail. It would probably be nice if the chunk allocator tried to be more deterministic about that. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBCgAGBQJUo2M8AAoJENRVrw2cjl5RihoH/1ulWpEK6lPaYhBSBbmWQyGu obJZBTbeMgBAfO9VMq9X2laUfmEprwYi8FuKnCwVgA1KyftFsaJngckqMoTtpwdI IXx2X2++MjZBkFBUFRhGlSQcbDgeB/RbBx+Vtxi2dNq3/WgZyHRfIJT1moRrxY0V UTH1kI7JsWg4blpdm+xW4o7UKds7JKHr5Th1PUH9SmJOdsBe2efIFQyC7hyuSQs0 gBUQzxmo3HcRzBtJwJjKRICU16VBN0NW7w3m/y6K1yIlkGi4U7MZgzMSUJw/BiMT tGX48AhBH3D3R2sjmF2aO5suPaHEVYoZuqhKevKZfTGS7izSYA74LqrGHkq5QBk= =ESya -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote: oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? I think that with an even number of identical-sized devices, you get the same guarantees (well, behaviour) as you would with traditional RAID-10. I may be wrong about that -- do test before relying on it. The FS probably won't like losing two devices, though, even if the remaining data is actually enough to reconstruct the FS. Hugo. -- Hugo Mills | I can resist everything except temptation hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: I need to P. are we almost there yet?
so am I to read that as if btrfs redundancy isn't really functional? if i yank a member of my raid 1 out in live prod is it going to take a dump on my data? On Mon, Dec 29, 2014 at 1:04 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote: oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? I think that with an even number of identical-sized devices, you get the same guarantees (well, behaviour) as you would with traditional RAID-10. I may be wrong about that -- do test before relying on it. The FS probably won't like losing two devices, though, even if the remaining data is actually enough to reconstruct the FS. Hugo. -- Hugo Mills | I can resist everything except temptation hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
By asking the question this way, I don't think you understand how Btrfs development works. But if you check out the git pull for 3.19 you'll see a bunch of patches that pretty much close the feature parity (no pun intended) gap for raid56 and raid0,1,10. But it is an rc, and still needs testing, and even once 3.19 becomes a stable kernel it's new enough code there can always be edge cases. And raid1 has been tested in Btrfs for how many years now? So if you want the same amount of raid6 testing by time it would be however many years that's been from the time 3.19 is released. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Mon, Dec 29, 2014 at 12:00 PM, sys.syphus syssyp...@gmail.com wrote: oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? It's only for sure one in any case even with conventional raid10. It just depends on which 2 you lose that depends whether your data has dodged a bullet. Obviously you can't lose a drive and its mirror, ever, or the array collapses. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need to P. are we almost there yet?
On Mon, Dec 29, 2014 at 02:25:14PM -0600, sys.syphus wrote: so am I to read that as if btrfs redundancy isn't really functional? if i yank a member of my raid 1 out in live prod is it going to take a dump on my data? Eh? Where did that conclusion some from? I said nothing at all about RAID-1, only RAID-10. So, to clarify: In the general case, you can safely lose one device from a btrfs RAID-10. Also in the general case, losing a second device will break the filesystem (with very high probability). In the case I gave below, with an even number of equal sized devices, the second device to be lost *may* allow the data to be recovered with sufficient effort, but the FS in general will probably not be mountable with two missing devices. So, btrfs RAID-10 offers the same *guarantees* as traditional RAID-10. It's generally less effective with the probabilities of the failure modes beyond the guarantee. Hugo. On Mon, Dec 29, 2014 at 1:04 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote: oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? I think that with an even number of identical-sized devices, you get the same guarantees (well, behaviour) as you would with traditional RAID-10. I may be wrong about that -- do test before relying on it. The FS probably won't like losing two devices, though, even if the remaining data is actually enough to reconstruct the FS. Hugo. -- Hugo Mills | emacs: Eighty Megabytes And Constantly Swapping. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: I need to P. are we almost there yet?
On Mon, Dec 29, 2014 at 12:00 PM, sys.syphus syssyp...@gmail.com wrote: oh, and sorry to bump myself. but is raid10 *ever* more redundant in btrfs-speak than raid1? I currently use raid1 but i know in mdadm speak raid10 means you can lose 2 drives assuming they aren't the wrong ones, is it safe to say with btrfs / raid 10 you can only lose one no matter what? It's only for sure one in any case even with conventional raid10. It just depends on which 2 you lose that depends whether your data has dodged a bullet. Obviously you can't lose a drive and its mirror, ever, or the array collapses. Just some background data on traditional RAID, and the chances of survival with a 2-drive failure. In traditional RAID-10, the chances of surviving a 2-drive failure is 66% on a 4-drive array, and approaches 100% as the number of drives in the array increase. In traditional RAID-0+1 (used to be common in low-end fake-RAID cards), the chances of surviving a 2-drive failure is 33% on a 4-drive array, and approaches 50% as the number of drives in the array increase. In traditional RAID-1E, the chances of surviving a 2-drive failure is 66% on a 4-drive array, and approaches 100% as the number of drives in the array increase. This is the same as for RAID-10. RAID-1E allows an odd number of disks to be actively used in the array. https://en.wikipedia.org/wiki/File:RAID_1E.png I'm wondering which of the above the BTRFS implementation most closely resembles. So if you want the same amount of raid6 testing by time it would be however many years that's been from the time 3.19 is released. I don't believe that's correct. Over those several years, quite a few tests for corner cases have been developed. I expect that those tests are used for regression testing of each release to ensure that old bugs aren't inadvertently reintroduced. Furthermore, I expect that a large number of those corner case tests can be easily modified to test RAID-5 and RAID-6. In reality, I expect the stability (i.e. similar to RAID-10 currently) of RAID-5/6 code in BTRFS will be achieved rather quickly (only a year or two). I expect that the difficult part will be to optimize the performance of BTRFS. Hopefully those tests (and others, yet to be developed) will be able to keep it stable while the code is optimized for performance. Peter Ashford -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html