Re: [gentoo-user] which linux RAID setup to choose?
On 4/5/20 3:50 pm, hitachi303 wrote: > Am 04.05.2020 um 02:46 schrieb Rich Freeman: >> On Sun, May 3, 2020 at 6:50 PM hitachi303 >> wrote: >> ... > So you are right. This is the way they do it. I used the term raid to > broadly. > But still they have problems with limitations. Size of room, what air > conditioning can handle and stuff like this. > > Anyway I only wanted to point out that there are different approaches > in the industries and saving the data at any price is not always > necessary. > I would suggest that once you go past a few drives there are better ways. I had two 4 disk, bcache fronted raid 10's on two PC'cs with critical data backed up between them. When an ssd bcache failed in one, and two backing stores in the other almost simultaneously I nearly had to resort to offline backups to restore the data ... downtime was still a major pain. I now have a 5 x cheap arm systems and a small x86 master with 7 disks across them - response time is good, power use seems less (being much cooler, quieter) than running two over the top older PC's. The reliability/recovery time (at least when I tested by manually failing drives and causing power outages) is much better. I am using moosefs, but lizardfs looks similar and both can offer erasure coding which gives more storage space still with recovery if you have enough disks. Downside is maintaining more systems, more complex networking and the like - Its been a few months now, and I wont be going back to raid for my main storage. BillK pEpkey.asc Description: application/pgp-keys
Re: [gentoo-user] which linux RAID setup to choose?
Am 04.05.2020 um 02:46 schrieb Rich Freeman: On Sun, May 3, 2020 at 6:50 PM hitachi303 wrote: The only person I know who is running a really huge raid ( I guess 2000+ drives) is comfortable with some spare drives. His raid did fail an can fail. Data will be lost. Everything important has to be stored at a secondary location. But they are using the raid to store data for some days or weeks when a server is calculating stuff. If the raid fails they have to restart the program for the calculation. So, if you have thousands of drives, you really shouldn't be using a conventional RAID solution. Now, if you're just using RAID to refer to any technology that stores data redundantly that is one thing. However, if you wanted to stick 2000 drives into a single host using something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with some kind of hacked-up solution for PCIe port replication plus SATA bus multipliers/etc, you're probably doing it wrong. (Really even with mdadm/zfs you probably still need some kind of terribly non-optimal solution for attaching all those drives to a single host.) At that scale you really should be using a distributed filesystem. Or you could use some application-level solution that accomplishes the same thing on top of a bunch of more modest hosts running zfs/etc (the Backblaze solution at least in the past). The most mainstream FOSS solution at this scale is Ceph. It achieves redundancy at the host level. That is, if you have it set up to tolerate two failures then you can take two random hosts in the cluster and smash their motherboards with a hammer in the middle of operation, and the cluster will keep on working and quickly restore its redundancy. Each host can have multiple drives, and losing any or all of the drives within a single host counts as a single failure. You can even do clever stuff like tell it which hosts are attached to which circuit breakers and then you could lose all the hosts on a single power circuit at once and it would be fine. This also has the benefit of covering you when one of your flakey drives causes weird bus issues that affect other drives, or one host crashes, and so on. The redundancy is entirely at the host level so you're protected against a much larger number of failure modes. This sort of solution also performs much faster as data requests are not CPU/NIC/HBA limited for any particular host. The software is obviously more complex, but the hardware can be simpler since if you want to expand storage you just buy more servers and plug them into the LAN, versus trying to figure out how to cram an extra dozen hard drives into a single host with all kinds of port multiplier games. You can also do maintenance and just reboot an entire host while the cluster stays online as long as you aren't messing with them all at once. I've gone in this general direction because I was tired of having to try to deal with massive cases, being limited to motherboards with 6 SATA ports, adding LSI HBAs that require an 8x slot and often conflicts with using an NVMe, and so on. So you are right. This is the way they do it. I used the term raid to broadly. But still they have problems with limitations. Size of room, what air conditioning can handle and stuff like this. Anyway I only wanted to point out that there are different approaches in the industries and saving the data at any price is not always necessary.
Re: [gentoo-user] which linux RAID setup to choose?
Am 04.05.2020 um 02:29 schrieb Caveman Al Toraboran: Facebook used to store data which is sometimes accessed on raids. Since they use energy they stored data which is nearly never accessed on blue ray disks. I don't know if they still do. Reading is very slow if a mechanical arm first needs to fetch a specific blue ray out of hundreds and put in a disk reader but it is very energy efficient. interesting. A video from 2014 https://www.facebook.com/Engineering/videos/10152128660097200/
Re: [gentoo-user] which linux RAID setup to choose?
On Monday, May 4, 2020 3:19 AM, antlists wrote: > On 03/05/2020 22:46, Caveman Al Toraboran wrote: > > > On Sunday, May 3, 2020 6:27 PM, Jack ostrof...@users.sourceforge.net wrote: > > curious. how do people look at --layout=n2 in the > > storage industry? e.g. do they ignore the > > optimistic case where 2 disk failures can be > > recovered, and only assume that it protects for 1 > > disk failure? > > You CANNOT afford to be optimistic ... Murphy's law says you will lose > the wrong second disk. so i guess your answer is: "yes, the industry ignores the existence of optimistic cases". if that's true, then the industry is wrong, must learn the following: 1. don't bet that your data's survival is lingering on luck (you agree with this i know). 2. don't ignore statistics that reveal the fact that lucky cases exist. (1) and (2) are not mutually exclusive, and murfphy's law would suggest to not ignore (2). becuase, if you ignore (2), you'll end up adopting a 5-disk RAID10 instead of the superior 6-disk RAID10 and end up being less lucky in practice. don't rely on lucks, but why deny good luck to come to you when it might? --- two different things. > > i see why gambling is not worth it here, but at > > the same time, i see no reason to ignore reality > > (that a 2 disk failure can be saved). > > Don't ignore that some 2-disk failures CAN'T be saved ... yeah, i'm not. i'm just not ignoring that 2-disk failure might get saved. you know... it's better to have a lil window where some good luck may chime in than banning good luck. > Don't forget, if you have a spare disk, the repair window is the length > of time it takes to fail-over ... yup. just trying to not rely on good luck that a spare is available. e.g. considering for the case that no space is there. > > this site [2] says that 76% of seagate disks fail > > per year (:D). and since disks fail independent > > of each other mostly, then, the probabilty of > > having 2 disks fail in a year is: > > 76% seems incredibly high. And no, disks do not fail independently of > each other. If you buy a bunch of identical disks, at the same time, and > stick them all in the same raid array, the chances of them all wearing > out at the same time are rather higher than random chance would suggest. i know. i had this as a note, but then removed it. anyway, some nitpics: 1. dependence != correlation. you mean correlation, not dependence. disk failure is correlated if they are baught together, but other disks don't cause the failure (unless from things like heat from other disks, or repair stress because of other disk failing). 2. i followed the extreme case where a person got his disks purchased at a random time, so that he was maximally lucky in that his disks didn't synchronize. why? (i) offers a better pessimistic result. now we know that this probability is actually lower than reality, which means that we know that the 3.5k bucks is actually even lower. this should scare us more (hence us relying on less luck). (ii) makes calculation easier.
Re: [gentoo-user] which linux RAID setup to choose?
On Monday, May 4, 2020 2:50 AM, hitachi303 wrote: > Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran: > > > so, in summary: > > /\ > > | a 5-disk RAID10 is better than a 6-disk RAID10 | > > | ONLY IF your data is WORTH LESS than 3,524.3 | > > | bucks. | > > \/ > > any thoughts? i'm a newbie. i wonder how > > industry people think? > > Don't forget that having more drives increases the odds of a failing > drive. If you have infinite drives at any given moment infinite drives > will fail. Anyway I wouldn't know how to calculate this. by drive, you mean a spinning hard disk? i'm not sure how "infinite" helps here even theoretically. e.g. say that every year, 76% of disks fail. in the limit as the number of disks approaches infinity, then 76% of infinity is infinity. but, how is this useful? > Most people are limited by money and space. Even if this isn't your > problem you will always need an additional backup strategy. The hole > system can fail. > I run a system with 8 drives where two can fail and they can be hot > swoped. This is a closed source SAS which I really like except the part > being closed source. I don't even know what kind of raid is used. > > The only person I know who is running a really huge raid ( I guess 2000+ > drives) is comfortable with some spare drives. His raid did fail an can > fail. Data will be lost. Everything important has to be stored at a > secondary location. But they are using the raid to store data for some > days or weeks when a server is calculating stuff. If the raid fails they > have to restart the program for the calculation. thanks a lot. highly appreciate these tips about how others run their storage. however, i am not sure what is the takeaway from this. e.g. your closed-source NAS vs. a large RAID. they don't seem to be mutually exclusive to me (both might be on RAID). to me, a NAS is just a computer with RAID. no? > Facebook used to store data which is sometimes accessed on raids. Since > they use energy they stored data which is nearly never accessed on blue > ray disks. I don't know if they still do. Reading is very slow if a > mechanical arm first needs to fetch a specific blue ray out of hundreds > and put in a disk reader but it is very energy efficient. interesting.
Re: [gentoo-user] which linux RAID setup to choose?
On Sunday, May 3, 2020 6:27 PM, Jack wrote: > Minor point - you have one duplicate line there ". f f ." which is the > second and last line of the second group. No effect on anything else in > the discussion. thanks. > Trying to help thinking about odd numbers of disks, if you are still > allowing only one disk to fail, then you can think about mirroring half > disks, so each disk has half of it mirrored to a different disk, instead > of drives always being mirrored in pairs. that definitely helped get me unstuck and continue thinking. thanks. curious. how do people look at --layout=n2 in the storage industry? e.g. do they ignore the optimistic case where 2 disk failures can be recovered, and only assume that it protects for 1 disk failure? i see why gambling is not worth it here, but at the same time, i see no reason to ignore reality (that a 2 disk failure can be saved). e.g. a 4-disk RAID10 with -layout=n2 gives 1*4/10 + 2*4/10 = 1.2 expected recoverable disk failures. details are below: F . . . < recoverable . F . . < cases with . . F . < 1 disk . . . F < failure F . . F < recoverable . F F . < cases with . F . F < 2 disk F . F . < failures F F . . < not recoverable . . F F < cases with 2 disk < failures now, if we do a 5-disk --layout=n2, we get: 1(1)2(2)3 (3)4(4)5(5) 6(6)7(7)8 (8)9(9)10 (10) 11 (11) 12 (12) 13 (13) ... obviously, there are 5 possible ways a single disk may fail, out of which all of the 5 will be recovered. there are nchoosek(5,2) = 10 possible ways a 2 disk failure could happen, out of which 5 will be recovered: xxx (1) xxx (2)3 xxx4xxx5(5) xxx (1)2xxx3 xxx4(4) xxx (5) 1xxx2xxx3 (3) xxx (4) xxx (5) 1xxx2(2) xxx (3) xxx (4)5xxx 1(1) xxx (2) xxx (3)4xxx5xxx so, expected recoverable disk failures for a 5-disk RAID10 --layout=n2 is: 1*5/15 + 2*5/15 = 1 so, by transforming a 4-disk RAID10 into a 5-disk one, we increase total storage capacity by a 0.5 disk's worth of storage, while losing the ability to recover 0.2 disks. but if we extended the 4-disk RAID10 into a 6-disk --layout=n2, we will have: 6 nchoosek(6,2) - 3 = 1 * - + 2 * - 6 + nchoosek(6,2) 6 + nchoosek(6,2) = 6/21 + 2 * 12/15 = 1.8857 expected recoverable failing disks. almost 2. i.e. there is 80% chance of surviving a 2 disk failure. so, i wonder, is it a bad decision to go with an even number disks with a RAID10? what is the right way to think to find an answer to this question? i guess the ultimate answer needs knowledge of these: * F1: probability of having 1 disks fail within the repair window. * F2: probability of having 2 disks fail within the repair window. * F3: probability of having 3 disks fail within . the repair window. . . * Fn: probability of having n disks fail within the repair window. * R1: probability of surviving 1 disks failure. equals 1 with all related cases. * R2: probability of surviving 2 disks failure. equals 1/3 with 5-disk RAID10 equals 0.8 with a 6-disk RAID10. * R3: probability of surviving 3 disks failure. equals 0 with all related cases. . . . * Rn: probability of surviving n disks failure. equals 0 with all related cases. * L : expected cost of losing data on an array. * D : price of a disk. this way, the absolute expected cost when adopting a 6-disk RAID10 is: = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ... = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ... = 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ... and the absolute cost for a 5-disk RAID10 is: = 5D + F1*(1-1)*L + F2*(1-0.)*L + F3*(1-0)*L + ... = 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ... canceling identical terms, the difference cost is: 6-disk ===> 6D + 0.2*F2*L 5-disk ===> 5D + 0.6667*F2*L from here [1] we know that a 1TB disk costs $35.85, so: 6-disk ===> 6*35.85 + 0.2*F2*L 5-disk ===> 5*35.85 + 0.6667*F2*L now, at which point is a 5-disk array a better economical decision than a 6-disk one? for simplicity, let LOL = F2*L: 5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL 0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85 LOL * (0.6667 - 0.2)< 6*35.85 - 5*35.85 6*35.85 - 5*35.85 LOL < - 0.6667 - 0.2 LOL < 76.816 F
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 6:50 PM hitachi303 wrote: > > The only person I know who is running a really huge raid ( I guess 2000+ > drives) is comfortable with some spare drives. His raid did fail an can > fail. Data will be lost. Everything important has to be stored at a > secondary location. But they are using the raid to store data for some > days or weeks when a server is calculating stuff. If the raid fails they > have to restart the program for the calculation. So, if you have thousands of drives, you really shouldn't be using a conventional RAID solution. Now, if you're just using RAID to refer to any technology that stores data redundantly that is one thing. However, if you wanted to stick 2000 drives into a single host using something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with some kind of hacked-up solution for PCIe port replication plus SATA bus multipliers/etc, you're probably doing it wrong. (Really even with mdadm/zfs you probably still need some kind of terribly non-optimal solution for attaching all those drives to a single host.) At that scale you really should be using a distributed filesystem. Or you could use some application-level solution that accomplishes the same thing on top of a bunch of more modest hosts running zfs/etc (the Backblaze solution at least in the past). The most mainstream FOSS solution at this scale is Ceph. It achieves redundancy at the host level. That is, if you have it set up to tolerate two failures then you can take two random hosts in the cluster and smash their motherboards with a hammer in the middle of operation, and the cluster will keep on working and quickly restore its redundancy. Each host can have multiple drives, and losing any or all of the drives within a single host counts as a single failure. You can even do clever stuff like tell it which hosts are attached to which circuit breakers and then you could lose all the hosts on a single power circuit at once and it would be fine. This also has the benefit of covering you when one of your flakey drives causes weird bus issues that affect other drives, or one host crashes, and so on. The redundancy is entirely at the host level so you're protected against a much larger number of failure modes. This sort of solution also performs much faster as data requests are not CPU/NIC/HBA limited for any particular host. The software is obviously more complex, but the hardware can be simpler since if you want to expand storage you just buy more servers and plug them into the LAN, versus trying to figure out how to cram an extra dozen hard drives into a single host with all kinds of port multiplier games. You can also do maintenance and just reboot an entire host while the cluster stays online as long as you aren't messing with them all at once. I've gone in this general direction because I was tired of having to try to deal with massive cases, being limited to motherboards with 6 SATA ports, adding LSI HBAs that require an 8x slot and often conflicts with using an NVMe, and so on. -- Rich
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 6:52 PM Mark Knecht wrote: > > On Sun, May 3, 2020 at 1:16 PM Rich Freeman wrote: > > > > Up until a few weeks ago I would have advised the same, but WD was > > just caught shipping unadvertised SMR in WD Red disks. This is going > > to at the very least impact your performance if you do a lot of > > writes, and it can be incompatible with rebuilds in particular with > > some RAID implementations. Seagate and Toshiba have also been quietly > > using it but not in their NAS-labeled drives and not as extensively in > > general. > > I read somewhere that they knew they'd been caught and were coming clean. Yup. WD was caught. Then they first came out with a "you're using it wrong" sort of defense but they did list the SMR drives. Then they came out with a bit more of an even-handed response. The others weren't caught as far as I'm aware but probably figured the writing was on the wall since no doubt everybody and their uncle is going to be benchmarking every drive they own. > Another case of unbridled capitalism and consumers being hurt. I agree. This video has a slightly different perspective. It doesn't disagree on that conclusion, but it does explain more of the industry thinking that got us here (beyond the simple/obvious it saves money): https://www.youtube.com/watch?v=gSionmmunMs -- Rich
Re: [gentoo-user] which linux RAID setup to choose?
On 03/05/2020 22:46, Caveman Al Toraboran wrote: On Sunday, May 3, 2020 6:27 PM, Jack wrote: curious. how do people look at --layout=n2 in the storage industry? e.g. do they ignore the optimistic case where 2 disk failures can be recovered, and only assume that it protects for 1 disk failure? You CANNOT afford to be optimistic ... Murphy's law says you will lose the wrong second disk. i see why gambling is not worth it here, but at the same time, i see no reason to ignore reality (that a 2 disk failure can be saved). Don't ignore that some 2-disk failures CAN'T be saved ... e.g. a 4-disk RAID10 with -layout=n2 gives 1*4/10 + 2*4/10 = 1.2 expected recoverable disk failures. details are below: now, if we do a 5-disk --layout=n2, we get: 1(1)2(2)3 (3)4(4)5(5) 6(6)7(7)8 (8)9(9)10 (10) 11 (11) 12 (12) 13 (13) ... obviously, there are 5 possible ways a single disk may fail, out of which all of the 5 will be recovered. Don't forget a 4+spare layout, which *should* survive a 2-disk failure. there are nchoosek(5,2) = 10 possible ways a 2 disk failure could happen, out of which 5 will be recovered: so, by transforming a 4-disk RAID10 into a 5-disk one, we increase total storage capacity by a 0.5 disk's worth of storage, while losing the ability to recover 0.2 disks. but if we extended the 4-disk RAID10 into a 6-disk --layout=n2, we will have: 6 nchoosek(6,2) - 3 = 1 * - + 2 * - 6 + nchoosek(6,2) 6 + nchoosek(6,2) = 6/21 + 2 * 12/15 = 1.8857 expected recoverable failing disks. almost 2. i.e. there is 80% chance of surviving a 2 disk failure. so, i wonder, is it a bad decision to go with an even number disks with a RAID10? what is the right way to think to find an answer to this question? i guess the ultimate answer needs knowledge of these: * F1: probability of having 1 disks fail within the repair window. * F2: probability of having 2 disks fail within the repair window. * F3: probability of having 3 disks fail within . the repair window. . . * Fn: probability of having n disks fail within the repair window. * R1: probability of surviving 1 disks failure. equals 1 with all related cases. * R2: probability of surviving 2 disks failure. equals 1/3 with 5-disk RAID10 equals 0.8 with a 6-disk RAID10. * R3: probability of surviving 3 disks failure. equals 0 with all related cases. . . . * Rn: probability of surviving n disks failure. equals 0 with all related cases. * L : expected cost of losing data on an array. * D : price of a disk. Don't forget, if you have a spare disk, the repair window is the length of time it takes to fail-over ... this way, the absolute expected cost when adopting a 6-disk RAID10 is: = 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ... = 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ... = 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ... and the absolute cost for a 5-disk RAID10 is: = 5D + F1*(1-1)*L + F2*(1-0.)*L + F3*(1-0)*L + ... = 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ... canceling identical terms, the difference cost is: 6-disk ===> 6D + 0.2*F2*L 5-disk ===> 5D + 0.6667*F2*L from here [1] we know that a 1TB disk costs $35.85, so: 6-disk ===> 6*35.85 + 0.2*F2*L 5-disk ===> 5*35.85 + 0.6667*F2*L now, at which point is a 5-disk array a better economical decision than a 6-disk one? for simplicity, let LOL = F2*L: 5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL 0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85 LOL * (0.6667 - 0.2)< 6*35.85 - 5*35.85 6*35.85 - 5*35.85 LOL < - 0.6667 - 0.2 LOL < 76.816 F2*L < 76.816 so, a 5-disk RAID10 is better than a 6-disk RAID10 only if: F2*L < 76.816 bucks. this site [2] says that 76% of seagate disks fail per year (:D). and since disks fail independent of each other mostly, then, the probabilty of having 2 disks fail in a year is: 76% seems incredibly high. And no, disks do not fail independently of each other. If you buy a bunch of identical disks, at the same time, and stick them all in the same raid array, the chances of them all wearing out at the same time are rather higher than random chance would suggest. Which is why, if a raid disk fails, the advice is always to replace it asap. And if possible, to recover the failed drive to try and copy that rather than hammer the rest of the raid. Bear in mind that, it doesn't matter how many drives a raid-10 has, if you're recovering on to a new drive, the data is stored on just two of the o
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 1:16 PM Rich Freeman wrote: > > On Sun, May 3, 2020 at 2:29 PM Mark Knecht wrote: > > > > I've used the WD Reds and WD Golds (no not sold) and never had any problem. > > > > Up until a few weeks ago I would have advised the same, but WD was > just caught shipping unadvertised SMR in WD Red disks. This is going > to at the very least impact your performance if you do a lot of > writes, and it can be incompatible with rebuilds in particular with > some RAID implementations. Seagate and Toshiba have also been quietly > using it but not in their NAS-labeled drives and not as extensively in > general. I read somewhere that they knew they'd been caught and were coming clean. As I'm not buying anything at this time I didn't pay too much attention. This link is at least similar to what I read earlier. Possibly it's of interest. https://www.extremetech.com/computing/309730-western-digital-comes-clean-shares-which-hard-drives-use-smr Another case of unbridled capitalism and consumers being hurt. Cheers, Mark
Re: [gentoo-user] which linux RAID setup to choose?
Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran: so, in summary: /\ | a 5-disk RAID10 is better than a 6-disk RAID10 | | ONLY IF your data is WORTH LESS than 3,524.3 | | bucks. | \/ any thoughts? i'm a newbie. i wonder how industry people think? Don't forget that having more drives increases the odds of a failing drive. If you have infinite drives at any given moment infinite drives will fail. Anyway I wouldn't know how to calculate this. Most people are limited by money and space. Even if this isn't your problem you will always need an additional backup strategy. The hole system can fail. I run a system with 8 drives where two can fail and they can be hot swoped. This is a closed source SAS which I really like except the part being closed source. I don't even know what kind of raid is used. The only person I know who is running a really huge raid ( I guess 2000+ drives) is comfortable with some spare drives. His raid did fail an can fail. Data will be lost. Everything important has to be stored at a secondary location. But they are using the raid to store data for some days or weeks when a server is calculating stuff. If the raid fails they have to restart the program for the calculation. Facebook used to store data which is sometimes accessed on raids. Since they use energy they stored data which is nearly never accessed on blue ray disks. I don't know if they still do. Reading is very slow if a mechanical arm first needs to fetch a specific blue ray out of hundreds and put in a disk reader but it is very energy efficient.
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 5:32 PM antlists wrote: > > On 03/05/2020 21:07, Rich Freeman wrote: > > I don't think you should focus so much on whether read=write in your > > RAID. I'd focus more on whether read and write both meet your > > requirements. > > If you think about it, it's obvious that raid-1 will read faster than it > writes - it has to write two copies while it only reads one. Yes. The same is true for RAID10, since it has to also write two copies of everything. > > Likewise, raids 5 and 6 will be slower writing than reading - for a > normal read it only reads the data disks, but when writing it has to > write (and calculate!) parity as well. Yes, but with any of the striped modes (0, 5, 6, 10) there is an additional issue. Writes have to generally be made in entire stripes, so if you overwrite data in-place in units smaller than an entire stripe, then the entire stripe needs to first be read, and then it can be overwritten again. This is an absolute requirement if there is parity involved. If there is no parity (RAID 0,10) then an implementation might be able to overwrite part of a stripe in place without harming the rest. > > A raid 1 should read data faster than a lone disk. A raid 5 or 6 should > read noticeably faster because it's reading across more than one disk. More-or-less. RAID 1 is going to generally benefit from lower latency because reads can be divided across mirrored copies (and there could be more than one replica). Any of the striped modes are going to be the same as a single disk on latency, but will have much greater bandwidth. That bandwidth gain applies to both reading and writing, as long as the data is sequential. This is why it is important to understand your application. There is no one "best" RAID implementation. They all have pros and cons depending on whether you care more about latency vs bandwidth and also read vs write. And of course RAID isn't the only solution out there for this stuff. Distributed filesystems also have pros and cons, and often those have multiple modes of operation on top of this (usually somewhat mirroring the options available for RAID but across multiple hosts). For general storage I'm using zfs with raid1 pairs of disks (the pool can have multiple pairs), and for my NAS for larger-scale media/etc storage I'm using lizardfs. I'd use ceph instead in any kind of enterprise setup, but that is much more RAM-hungry and I'm cheap. -- Rich
Re: [gentoo-user] which linux RAID setup to choose?
On Sunday, May 3, 2020 1:23 PM, Wols Lists wrote: > For anything above raid 1, MAKE SURE your drives support SCT/ERC. For > example, Seagate Barracudas are very popular desktop drives, but I guess > maybe HALF of the emails asking for help recovering an array on the raid > list involve them dying ... > > (I've got two :-( but my new system - when I get it running - has > ironwolves instead.) that's very scary. just to double check: are those help emails about linux's software RAID? or is it about hardware RAIDs? the reason i ask about software vs. hardware, is because of this wiki article [1] which seems to suggest that mdadm handles error recovery by waiting for up to 30 seconds (set in /sys/block/sd*/device/timeout) after which the device is reset. am i missing something? to me it seems that [1] seems to suggest that linux software raid has a reliable way to handle the issue? since i guess all disks support resetting well? [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
Re: [gentoo-user] which linux RAID setup to choose?
On 03/05/2020 21:07, Rich Freeman wrote: I don't think you should focus so much on whether read=write in your RAID. I'd focus more on whether read and write both meet your requirements. If you think about it, it's obvious that raid-1 will read faster than it writes - it has to write two copies while it only reads one. Likewise, raids 5 and 6 will be slower writing than reading - for a normal read it only reads the data disks, but when writing it has to write (and calculate!) parity as well. A raid 1 should read data faster than a lone disk. A raid 5 or 6 should read noticeably faster because it's reading across more than one disk. If you're worried about write speeds, add a cache. Cheers, Wol
Re: [gentoo-user] which linux RAID setup to choose?
On 03/05/2020 18:55, Caveman Al Toraboran wrote: On Sunday, May 3, 2020 1:23 PM, Wols Lists wrote: For anything above raid 1, MAKE SURE your drives support SCT/ERC. For example, Seagate Barracudas are very popular desktop drives, but I guess maybe HALF of the emails asking for help recovering an array on the raid list involve them dying ... (I've got two :-( but my new system - when I get it running - has ironwolves instead.) that's very scary. just to double check: are those help emails about linux's software RAID? or is it about hardware RAIDs? They are about linux software raid. Hardware raid won't be any better. the reason i ask about software vs. hardware, is because of this wiki article [1] which seems to suggest that mdadm handles error recovery by waiting for up to 30 seconds (set in /sys/block/sd*/device/timeout) after which the device is reset. Which if your drive does not support SCT/ERC then goes *badly* wrong. am i missing something? Yes ... to me it seems that [1] seems to suggest that linux software raid has a reliable way to handle the issue? Well, if the paragraph below were true, it would. since i guess all disks support resetting well? That's the point. THEY DON'T! That's why you need SCT/ERC ... [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID https://raid.wiki.kernel.org/index.php/Choosing_your_hardware,_and_what_is_a_device%3F#Desktop_and_Enterprise_drives https://raid.wiki.kernel.org/index.php/Timeout_Mismatch Cheers, Wol
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 2:29 PM Mark Knecht wrote: > > I've used the WD Reds and WD Golds (no not sold) and never had any problem. > Up until a few weeks ago I would have advised the same, but WD was just caught shipping unadvertised SMR in WD Red disks. This is going to at the very least impact your performance if you do a lot of writes, and it can be incompatible with rebuilds in particular with some RAID implementations. Seagate and Toshiba have also been quietly using it but not in their NAS-labeled drives and not as extensively in general. At the very least you should check the model number lists that have been recently released to check if the drive you want to get uses SMR. I'd also get it from someplace with a generous return policy and do some benchmarking to confirm that the drive isn't SMR (you're probably going to have to do continuous random writes exceeding the total capacity of the drive before you see problems - or at least quite a bit of random writing - the amount of writing needed will be less once the drive has been in use for a while but a fresh drive basically acts like close to a full-disk-sized write cache as far as SMR goes). > Build a RAID with a WD Green and you're in for trouble. ;-))) It really depends on your RAID implementation. Certainly I agree that it is better to have TLER, but for some RAID implementations not having it just causes performance drops when you actually have errors (which should be very rare). For others it can cause drives to be dropped. I wouldn't hesitate to use greens in an mdadm or zfs array with default options, but with something like hardware RAID I'd be more careful. If you use aggressive timeouts on your RAID then the Green is more likely to get kicked out. I agree with the general sentiment to have a spare if it will take you a long time to replace failed drives. Alternatively you can have additional redundancy, or use a RAID alternative that basically treats all free space as an effective spare (like many distributed filesystems). -- Rich
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 1:44 AM Caveman Al Toraboran wrote: > > * RAID 1: fails to satisfy points (1) and (3)... > this leaves me with RAID 10 Two things: 1. RAID 10 doesn't satisfy point 1 (read and write performance are identical). No RAID implementation I'm aware of does. 2. Some RAID1 implementations can satisfy point 3 (expandability to additional space and replication multiplicities), particular when combined with LVM. I'd stop and think about your requirements a bit. You seem really concerned about having identical read and write performance. RAID implementations all have their pros in cons both in comparison with each other, in comparison with non-RAID, and in comparison between read and write within any particular RAID implementation. I don't think you should focus so much on whether read=write in your RAID. I'd focus more on whether read and write both meet your requirements. And on that note, what are your requirements? You haven't mentioned what you plan to store on it or how this data will be stored or accessed. It is hard to say whether any design will meet your performance requirements when you haven't provided any, other than a fairly arbitrary read=write one. In general most RAID1 implementations aren't going to lag regular non-RAID disk by much and will often exceed it (especially for reading). I'm not saying RAID1 is the best option for you - I'm just suggesting that you don't toss it out just because it reads faster than it writes, especially in favor of RAID 10 which also reads faster than it writes but has the additional caveat that small writes may necessitate an additional read before write. Not knowing your requirements it is hard to make more specific recommendations but I'd also consider ZFS and distributed filesystems. They have some pros and cons around flexibility and if you're operating at a small scale - it might not be appropriate for your use case, but you should consider them. -- Rich
Re: [gentoo-user] which linux RAID setup to choose?
On Sun, May 3, 2020 at 10:56 AM Caveman Al Toraboran < toraboracave...@protonmail.com> wrote: > > On Sunday, May 3, 2020 1:23 PM, Wols Lists wrote: > > > For anything above raid 1, MAKE SURE your drives support SCT/ERC. For > > example, Seagate Barracudas are very popular desktop drives, but I guess > > maybe HALF of the emails asking for help recovering an array on the raid > > list involve them dying ... > > > > (I've got two :-( but my new system - when I get it running - has > > ironwolves instead.) > > that's very scary. > > just to double check: are those help emails about > linux's software RAID? or is it about hardware > RAIDs? > > the reason i ask about software vs. hardware, is > because of this wiki article [1] which seems to > suggest that mdadm handles error recovery by > waiting for up to 30 seconds (set in > /sys/block/sd*/device/timeout) after which the > device is reset. > > am i missing something? to me it seems that [1] > seems to suggest that linux software raid has a > reliable way to handle the issue? since i guess > all disks support resetting well? > > [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID > When doing Linux RAID, hardware or software, make sure you get a RAID aware drive that supports TLER (Time Limited Error Recovery) or whatever the vendor that makes your drive calls it. Typically this is set at about 7 seconds guaranteeing that no mater what's going on the drive will respond to the upper layers (mdadm) to let it know it's alive. A non-RAID drive with no TLER feature will respond when it's ready and typically if that's longer than 30 seconds then the RAID subsystem kicks the drive and you have to re-add it. While there's nothing 'technically' wrong with the storage when the RAID rebuilds you eventually hit another on of these >30 second waits and another drive gets kicked and you're dead. I've used the WD Reds and WD Golds (no not sold) and never had any problem. Build a RAID with a WD Green and you're in for trouble. ;-))) HTH, Mark
Re: [gentoo-user] which linux RAID setup to choose?
Caveman Al Toraboran wrote: > On Sunday, May 3, 2020 1:23 PM, Wols Lists wrote: > >> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For >> example, Seagate Barracudas are very popular desktop drives, but I guess >> maybe HALF of the emails asking for help recovering an array on the raid >> list involve them dying ... >> >> (I've got two :-( but my new system - when I get it running - has >> ironwolves instead.) > that's very scary. > > just to double check: are those help emails about > linux's software RAID? or is it about hardware > RAIDs? > > the reason i ask about software vs. hardware, is > because of this wiki article [1] which seems to > suggest that mdadm handles error recovery by > waiting for up to 30 seconds (set in > /sys/block/sd*/device/timeout) after which the > device is reset. > > am i missing something? to me it seems that [1] > seems to suggest that linux software raid has a > reliable way to handle the issue? since i guess > all disks support resetting well? > > [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID > > > I'd like to add something about the PMR/SMR thing. I bought a SMR drive without knowing it. Now when I search for a hard drive, I add NAS to the search string. That seems to weed out the SMR type drives. Once I find a exact model, I google it up to confirm. So far, that little trick has worked pretty well. It may be something you want to consider using as well. NAS drives tend to be more robust it seems. Given you are using RAID, you likely want a more robust and dependable drive, if drives can be put into that category nowadays. :/ Hope that helps. Dale :-) :-)
Re: [gentoo-user] which linux RAID setup to choose?
On 5/3/20 1:44 AM, Caveman Al Toraboran wrote: [snip]... so, we get the following combinations of disk failures that, if happen, we won't lose any data: RAID0 --^-- RAID1 RAID1 --^-- --^-- F . . . < cases with . F . . < single disk . . F . < failures . . . F < F . . F < cases with . F F . < two disk . F . F < failures F . F . < . F F . < this gives us 4+5=9 possible disk failure scenarious where we can survive it without any data loss. Minor point - you have one duplicate line there ". f f ." which is the second and last line of the second group. No effect on anything else in the discussion. Trying to help thinking about odd numbers of disks, if you are still allowing only one disk to fail, then you can think about mirroring half disks, so each disk has half of it mirrored to a different disk, instead of drives always being mirrored in pairs.
Re: [gentoo-user] which linux RAID setup to choose?
On Sunday, May 3, 2020 1:14 PM, Wols Lists wrote: > > Q3: what are the future growth/shrinkage > > options for a RAID10 setup? e.g. with > > respect to these: > > > > 1. read/write speed. > > > > iirc far is good for speed. > > > 2. tolerance guarantee towards failing > >disks. > > > > Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes > you can gamble on losing more. > > > 3. total available space. > > > > iirc you can NOT grow the far layout. sorry, typo, i meant "near" (the command was right though --layout=n2)
Re: [gentoo-user] which linux RAID setup to choose?
On 03/05/20 08:53, hitachi303 wrote: > Nothing you asked but I had very bad experience with drives which spin > down by themselves to save energy (mostly titled green or so). Good catch! For anything above raid 1, MAKE SURE your drives support SCT/ERC. For example, Seagate Barracudas are very popular desktop drives, but I guess maybe HALF of the emails asking for help recovering an array on the raid list involve them dying ... (I've got two :-( but my new system - when I get it running - has ironwolves instead.) Cheers, Wol
Re: [gentoo-user] which linux RAID setup to choose?
On 03/05/20 06:44, Caveman Al Toraboran wrote: > hi - i'm to setup my 1st RAID, and i'd appreciate > if any of you volunteers some time to share your > valuable experience on this subject. > > my scenario > --- > > 0. i don't boot from the RAID. > > 1. read is as important as write. i don't >have any application-specific scenario that >makes me somehow favor one over another. >so RAIDs that speed up the read (or write) >while significantly harming the write (or >read) is not welcome. > > 2. replacing failed disks may take a week or >two. so, i guess that i may have several >disks fail one after another in the 1-2 >weeks (specially if they were bought >about the same time). > > 3. i would like to be able to grow the RAID's >total space (as needed), and increase its >reliability (i.e. duplicates/partities) as >needed. > >e.g. suppose that i got a 2TB RAID that >tolerates 1 disk failure. i'd like to, at >some point, to have the following options: > > * only increase the total space (e.g. >make it 3TB), without increasing >failure toleration (so 2 disk failure >would result in data loss). > > * or, only increase the failure tolerance >(e.g. such that 2 disks failure would >not lead to data loss), without >increasing the total space (e.g. space >remains 2TB). > > * or, increase, both, the space and the >failure tolerance at the same time. > > 4. only interested in software RAID. > > my thought > -- > > i think these are not suitable: > > * RAID 0: fails to satisfy point (3). > > * RAID 1: fails to satisfy points (1) and (3). > > * RAIDs 4 to 6: fails to satisfy point (3) > since they are stuck with a fixed tolerance > towards failing disks (i.e. RAIDs 4 and 5 > tolerate only 1 disk failure, and RAID 6 > tolerates only 2). > > > this leaves me with RAID 10, with the "far" > layout. e.g. --layout=n2 would tolerate the > failure of two disks, --layout=n3 three, etc. or > is it? (i'm not sure). > > my questions > > > Q1: which RAID setup would you recommend? I'd recommend having a spare in the array. That way, a single failure would not affect redundancy at all. You can then replace the spare at your leisure. If you want to grow the array, I'd also suggest "raid 5 + spare". That's probably better than 6 for writing. but 6 is better than 5 for redundancy. Look at having a journal - that could speed up write speed for raid 6. > > Q2: how would the total number of disks in a > RAID10 setup affect the tolerance towards > the failing disks? > Sadly, it doesn't. If you have two copies, losing two disks COULD take out your raid. > if the total number of disks is even, then > it is easy to see how this is equivalent > to the classical RAID 1+0 as shown in > md(4), where any disk failure is tolerated > for as long as each RAID1 group has 1 disk > failure only. That's a gamble ... > > so, we get the following combinations of > disk failures that, if happen, we won't > lose any data: > > RAID0 > --^-- > RAID1 RAID1 > --^-- --^-- > F . . . < cases with > . F . . < single disk > . . F . < failures > . . . F < > > F . . F < cases with > . F F . < two disk > . F . F < failures > F . F . < > . F F . < > > this gives us 4+5=9 possible disk failure > scenarious where we can survive it without > any data loss. > > but, when the number of disks is odd, then > written bytes and their duplicates will > start wrap around, and it is difficult for > me to intuitively see how would this > affect the total number of scenarious > where i will survive a disk failure. > > Q3: what are the future growth/shrinkage > options for a RAID10 setup? e.g. with > respect to these: > > 1. read/write speed. iirc far is good for speed. > 2. tolerance guarantee towards failing >disks. Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes you can gamble on losing more. > 3. total available space. iirc you can NOT grow the far layout. > > rgrds, > cm. > You have looked at the wiki - yes I know I push it regularly :-) https://raid.wiki.kernel.org/index.php/Linux_Raid Cheers, Wol
Re: [gentoo-user] which linux RAID setup to choose?
Am 03.05.2020 um 07:44 schrieb Caveman Al Toraboran: * RAIDs 4 to 6: fails to satisfy point (3) since they are stuck with a fixed tolerance towards failing disks (i.e. RAIDs 4 and 5 tolerate only 1 disk failure, and RAID 6 tolerates only 2). As far as I remember there can be spare drives / partitions which will replace a failed one if needed. But this does not help if drives / partitions fail at the same moment. Under normal conditions spares will rise the number of drives which can fail. Nothing you asked but I had very bad experience with drives which spin down by themselves to save energy (mostly titled green or so). Also there has been some talk about SMR https://hardware.slashdot.org/story/20/04/19/0432229/storage-vendors-are-quietly-slipping-smr-disks-into-consumer-hard-drives