Re: [gentoo-user] which linux RAID setup to choose?

2020-05-04 Thread William Kenworthy

On 4/5/20 3:50 pm, hitachi303 wrote:
> Am 04.05.2020 um 02:46 schrieb Rich Freeman:
>> On Sun, May 3, 2020 at 6:50 PM hitachi303
>>  wrote: 

>> ...
> So you are right. This is the way they do it. I used the term raid to
> broadly.
> But still they have problems with limitations. Size of room, what air
> conditioning can handle and stuff like this.
>
> Anyway I only wanted to point out that there are different approaches
> in the industries and saving the data at any price is not always
> necessary.
>
I would suggest that once you go past a few drives there are better ways.

I had two 4 disk, bcache fronted raid 10's on two PC'cs with critical
data backed up between them.  When an ssd bcache failed in one, and two
backing stores in the other almost simultaneously I nearly had to resort
to offline backups to restore the data ... downtime was still a major pain.

I now have a 5 x cheap arm systems and a small x86 master with 7 disks
across them - response time is good, power use seems less (being much
cooler, quieter) than running two over the top older PC's.  The 
reliability/recovery time (at least when I tested by manually failing
drives and causing power outages) is much better.

I am using moosefs, but lizardfs looks similar and both can offer
erasure coding which gives more storage space still with recovery if you
have enough disks.

Downside is maintaining more systems, more complex networking and the
like - Its been a few months now, and I wont be going back to raid for
my main storage.

BillK




pEpkey.asc
Description: application/pgp-keys


Re: [gentoo-user] which linux RAID setup to choose?

2020-05-04 Thread hitachi303

Am 04.05.2020 um 02:46 schrieb Rich Freeman:

On Sun, May 3, 2020 at 6:50 PM hitachi303
 wrote:


The only person I know who is running a really huge raid ( I guess 2000+
drives) is comfortable with some spare drives. His raid did fail an can
fail. Data will be lost. Everything important has to be stored at a
secondary location. But they are using the raid to store data for some
days or weeks when a server is calculating stuff. If the raid fails they
have to restart the program for the calculation.


So, if you have thousands of drives, you really shouldn't be using a
conventional RAID solution.  Now, if you're just using RAID to refer
to any technology that stores data redundantly that is one thing.
However, if you wanted to stick 2000 drives into a single host using
something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with
some kind of hacked-up solution for PCIe port replication plus SATA
bus multipliers/etc, you're probably doing it wrong.  (Really even
with mdadm/zfs you probably still need some kind of terribly
non-optimal solution for attaching all those drives to a single host.)

At that scale you really should be using a distributed filesystem.  Or
you could use some application-level solution that accomplishes the
same thing on top of a bunch of more modest hosts running zfs/etc (the
Backblaze solution at least in the past).

The most mainstream FOSS solution at this scale is Ceph.  It achieves
redundancy at the host level.  That is, if you have it set up to
tolerate two failures then you can take two random hosts in the
cluster and smash their motherboards with a hammer in the middle of
operation, and the cluster will keep on working and quickly restore
its redundancy.  Each host can have multiple drives, and losing any or
all of the drives within a single host counts as a single failure.
You can even do clever stuff like tell it which hosts are attached to
which circuit breakers and then you could lose all the hosts on a
single power circuit at once and it would be fine.

This also has the benefit of covering you when one of your flakey
drives causes weird bus issues that affect other drives, or one host
crashes, and so on.  The redundancy is entirely at the host level so
you're protected against a much larger number of failure modes.

This sort of solution also performs much faster as data requests are
not CPU/NIC/HBA limited for any particular host.  The software is
obviously more complex, but the hardware can be simpler since if you
want to expand storage you just buy more servers and plug them into
the LAN, versus trying to figure out how to cram an extra dozen hard
drives into a single host with all kinds of port multiplier games.
You can also do maintenance and just reboot an entire host while the
cluster stays online as long as you aren't messing with them all at
once.

I've gone in this general direction because I was tired of having to
try to deal with massive cases, being limited to motherboards with 6
SATA ports, adding LSI HBAs that require an 8x slot and often
conflicts with using an NVMe, and so on.



So you are right. This is the way they do it. I used the term raid to 
broadly.
But still they have problems with limitations. Size of room, what air 
conditioning can handle and stuff like this.


Anyway I only wanted to point out that there are different approaches in 
the industries and saving the data at any price is not always necessary.





Re: [gentoo-user] which linux RAID setup to choose?

2020-05-04 Thread hitachi303

Am 04.05.2020 um 02:29 schrieb Caveman Al Toraboran:

Facebook used to store data which is sometimes accessed on raids. Since
they use energy they stored data which is nearly never accessed on blue
ray disks. I don't know if they still do. Reading is very slow if a
mechanical arm first needs to fetch a specific blue ray out of hundreds
and put in a disk reader but it is very energy efficient.

interesting.


A video from 2014
https://www.facebook.com/Engineering/videos/10152128660097200/




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Caveman Al Toraboran
On Monday, May 4, 2020 3:19 AM, antlists  wrote:

> On 03/05/2020 22:46, Caveman Al Toraboran wrote:
>
> > On Sunday, May 3, 2020 6:27 PM, Jack ostrof...@users.sourceforge.net wrote:
> > curious. how do people look at --layout=n2 in the
> > storage industry? e.g. do they ignore the
> > optimistic case where 2 disk failures can be
> > recovered, and only assume that it protects for 1
> > disk failure?
>
> You CANNOT afford to be optimistic ... Murphy's law says you will lose
> the wrong second disk.

so i guess your answer is:  "yes, the industry
ignores the existence of optimistic cases".

if that's true, then the industry is wrong, must
learn the following:

1. don't bet that your data's survival is
   lingering on luck (you agree with this i know).

2. don't ignore statistics that reveal the fact
   that lucky cases exist.

(1) and (2) are not mutually exclusive, and
murfphy's law would suggest to not ignore (2).

becuase, if you ignore (2), you'll end up adopting
a 5-disk RAID10 instead of the superior 6-disk
RAID10 and end up being less lucky in practice.

don't rely on lucks, but why deny good luck to
come to you when it might?  --- two different
things.


> > i see why gambling is not worth it here, but at
> > the same time, i see no reason to ignore reality
> > (that a 2 disk failure can be saved).
>
> Don't ignore that some 2-disk failures CAN'T be saved ...

yeah, i'm not.  i'm just not ignoring that 2-disk
failure might get saved.

you know... it's better to have a lil window where
some good luck may chime in than banning good
luck.


> Don't forget, if you have a spare disk, the repair window is the length
> of time it takes to fail-over ...

yup.  just trying to not rely on good luck that a
spare is available.  e.g. considering for the case
that no space is there.

> > this site [2] says that 76% of seagate disks fail
> > per year (:D). and since disks fail independent
> > of each other mostly, then, the probabilty of
> > having 2 disks fail in a year is:
>
> 76% seems incredibly high. And no, disks do not fail independently of
> each other. If you buy a bunch of identical disks, at the same time, and
> stick them all in the same raid array, the chances of them all wearing
> out at the same time are rather higher than random chance would suggest.

i know.  i had this as a note, but then removed
it.  anyway, some nitpics:

1. dependence != correlation.  you mean
   correlation, not dependence.  disk failure is
   correlated if they are baught together, but
   other disks don't cause the failure (unless
   from things like heat from other disks, or
   repair stress because of other disk failing).

2. i followed the extreme case where a person got
   his disks purchased at a random time, so that
   he was maximally lucky in that his disks didn't
   synchronize.  why?

   (i) offers a better pessimistic result.
   now we know that this probability is actually
   lower than reality, which means that we know
   that the 3.5k bucks is actually even lower.
   this should scare us more (hence us relying on
   less luck).

   (ii) makes calculation easier.




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Caveman Al Toraboran
On Monday, May 4, 2020 2:50 AM, hitachi303  
wrote:

> Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran:
>
> > so, in summary:
> > /\
> > | a 5-disk RAID10 is better than a 6-disk RAID10 |
> > | ONLY IF your data is WORTH LESS than 3,524.3 |
> > | bucks. |
> > \/
> > any thoughts? i'm a newbie. i wonder how
> > industry people think?
>
> Don't forget that having more drives increases the odds of a failing
> drive. If you have infinite drives at any given moment infinite drives
> will fail. Anyway I wouldn't know how to calculate this.

by drive, you mean a spinning hard disk?

i'm not sure how "infinite" helps here even
theoretically.  e.g. say that every year, 76% of
disks fail.  in the limit as the number of disks
approaches infinity, then 76% of infinity is
infinity.  but, how is this useful?

> Most people are limited by money and space. Even if this isn't your
> problem you will always need an additional backup strategy. The hole
> system can fail.
> I run a system with 8 drives where two can fail and they can be hot
> swoped. This is a closed source SAS which I really like except the part
> being closed source. I don't even know what kind of raid is used.
>
> The only person I know who is running a really huge raid ( I guess 2000+
> drives) is comfortable with some spare drives. His raid did fail an can
> fail. Data will be lost. Everything important has to be stored at a
> secondary location. But they are using the raid to store data for some
> days or weeks when a server is calculating stuff. If the raid fails they
> have to restart the program for the calculation.

thanks a lot.  highly appreciate these tips about
how others run their storage.

however, i am not sure what is the takeaway from
this.  e.g. your closed-source NAS vs. a large
RAID.  they don't seem to be mutually exclusive to
me (both might be on RAID).

to me, a NAS is just a computer with RAID.  no?


> Facebook used to store data which is sometimes accessed on raids. Since
> they use energy they stored data which is nearly never accessed on blue
> ray disks. I don't know if they still do. Reading is very slow if a
> mechanical arm first needs to fetch a specific blue ray out of hundreds
> and put in a disk reader but it is very energy efficient.

interesting.




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Caveman Al Toraboran
On Sunday, May 3, 2020 6:27 PM, Jack  wrote:

> Minor point - you have one duplicate line there ". f  f ." which is the
> second and last line of the second group.  No effect on anything else in
> the discussion.

thanks.

> Trying to help thinking about odd numbers of disks, if you are still
> allowing only one disk to fail, then you can think about mirroring half
> disks, so each disk has half of it mirrored to a different disk, instead
> of drives always being mirrored in pairs.

that definitely helped get me unstuck and continue
thinking.  thanks.

curious.  how do people look at --layout=n2 in the
storage industry?  e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?

i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).

e.g. a 4-disk RAID10 with -layout=n2 gives

1*4/10 + 2*4/10 = 1.2

expected recoverable disk failures.  details are
below:

  F   .   .   .   < recoverable
  .   F   .   .   < cases with
  .   .   F   .   < 1 disk
  .   .   .   F   < failure

  F   .   .   F   < recoverable
  .   F   F   .   < cases with
  .   F   .   F   < 2 disk
  F   .   F   .   < failures

  F   F   .   .   < not recoverable
  .   .   F   F   < cases with 2 disk
  < failures

now, if we do a 5-disk --layout=n2, we get:

1(1)2(2)3
   (3)4(4)5(5)
6(6)7(7)8
   (8)9(9)10   (10)
11   (11)   12   (12)   13
   (13) ...

obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.

there are nchoosek(5,2) = 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:

   xxx   (1)   xxx   (2)3
   xxx4xxx5(5)

   xxx   (1)2xxx3
   xxx4(4)   xxx   (5)


1xxx2xxx3
   (3)   xxx   (4)   xxx   (5)

1xxx2(2)   xxx
   (3)   xxx   (4)5xxx


1(1)   xxx   (2)   xxx
   (3)4xxx5xxx

so, expected recoverable disk failures for a
5-disk RAID10 --layout=n2 is:

1*5/15 + 2*5/15 = 1

so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.

but if we extended the 4-disk RAID10 into a
6-disk --layout=n2, we will have:

 6  nchoosek(6,2) - 3
= 1 * -  +  2 * -
  6 + nchoosek(6,2) 6 + nchoosek(6,2)

= 6/21   +  2 * 12/15

= 1.8857 expected recoverable failing disks.

almost 2.  i.e. there is 80% chance of surviving a
2 disk failure.

so, i wonder, is it a bad decision to go with an
even number disks with a RAID10?  what is the
right way to think to find an answer to this
question?

i guess the ultimate answer needs knowledge of
these:

* F1: probability of having 1 disks fail within
  the repair window.
* F2: probability of having 2 disks fail within
  the repair window.
* F3: probability of having 3 disks fail within
  .   the repair window.
  .
  .
* Fn: probability of having n disks fail within
  the repair window.

* R1: probability of surviving 1 disks failure.
  equals 1 with all related cases.
* R2: probability of surviving 2 disks failure.
  equals 1/3 with 5-disk RAID10
  equals 0.8 with a 6-disk RAID10.
* R3: probability of surviving 3 disks failure.
  equals 0 with all related cases.
  .
  .
  .
* Rn: probability of surviving n disks failure.
  equals 0 with all related cases.

* L : expected cost of losing data on an array.
* D : price of a disk.

this way, the absolute expected cost when adopting
a 6-disk RAID10 is:

= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
= 6D + 0  + F2*(0.2)*L   + F3*(1-0)*L + ...

and the absolute cost for a 5-disk RAID10 is:

= 5D + F1*(1-1)*L + F2*(1-0.)*L + F3*(1-0)*L + ...
= 5D + 0  + F2*(0.6667)*L   + F3*(1-0)*L + ...

canceling identical terms, the difference cost is:

6-disk ===> 6D + 0.2*F2*L
5-disk ===> 5D + 0.6667*F2*L

from here [1] we know that a 1TB disk costs
$35.85, so:

6-disk ===> 6*35.85 + 0.2*F2*L
5-disk ===> 5*35.85 + 0.6667*F2*L

now, at which point is a 5-disk array a better
economical decision than a 6-disk one?  for
simplicity, let LOL = F2*L:

5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
LOL * (0.6667 - 0.2)<   6*35.85 - 5*35.85

6*35.85 - 5*35.85
   LOL  <   -
  0.6667 - 0.2

   LOL  <   76.816
   F

Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Rich Freeman
On Sun, May 3, 2020 at 6:50 PM hitachi303
 wrote:
>
> The only person I know who is running a really huge raid ( I guess 2000+
> drives) is comfortable with some spare drives. His raid did fail an can
> fail. Data will be lost. Everything important has to be stored at a
> secondary location. But they are using the raid to store data for some
> days or weeks when a server is calculating stuff. If the raid fails they
> have to restart the program for the calculation.

So, if you have thousands of drives, you really shouldn't be using a
conventional RAID solution.  Now, if you're just using RAID to refer
to any technology that stores data redundantly that is one thing.
However, if you wanted to stick 2000 drives into a single host using
something like mdadm/zfs, or heaven forbid a bazillion LSI HBAs with
some kind of hacked-up solution for PCIe port replication plus SATA
bus multipliers/etc, you're probably doing it wrong.  (Really even
with mdadm/zfs you probably still need some kind of terribly
non-optimal solution for attaching all those drives to a single host.)

At that scale you really should be using a distributed filesystem.  Or
you could use some application-level solution that accomplishes the
same thing on top of a bunch of more modest hosts running zfs/etc (the
Backblaze solution at least in the past).

The most mainstream FOSS solution at this scale is Ceph.  It achieves
redundancy at the host level.  That is, if you have it set up to
tolerate two failures then you can take two random hosts in the
cluster and smash their motherboards with a hammer in the middle of
operation, and the cluster will keep on working and quickly restore
its redundancy.  Each host can have multiple drives, and losing any or
all of the drives within a single host counts as a single failure.
You can even do clever stuff like tell it which hosts are attached to
which circuit breakers and then you could lose all the hosts on a
single power circuit at once and it would be fine.

This also has the benefit of covering you when one of your flakey
drives causes weird bus issues that affect other drives, or one host
crashes, and so on.  The redundancy is entirely at the host level so
you're protected against a much larger number of failure modes.

This sort of solution also performs much faster as data requests are
not CPU/NIC/HBA limited for any particular host.  The software is
obviously more complex, but the hardware can be simpler since if you
want to expand storage you just buy more servers and plug them into
the LAN, versus trying to figure out how to cram an extra dozen hard
drives into a single host with all kinds of port multiplier games.
You can also do maintenance and just reboot an entire host while the
cluster stays online as long as you aren't messing with them all at
once.

I've gone in this general direction because I was tired of having to
try to deal with massive cases, being limited to motherboards with 6
SATA ports, adding LSI HBAs that require an 8x slot and often
conflicts with using an NVMe, and so on.

-- 
Rich



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Rich Freeman
On Sun, May 3, 2020 at 6:52 PM Mark Knecht  wrote:
>
> On Sun, May 3, 2020 at 1:16 PM Rich Freeman  wrote:
> >
> > Up until a few weeks ago I would have advised the same, but WD was
> > just caught shipping unadvertised SMR in WD Red disks.  This is going
> > to at the very least impact your performance if you do a lot of
> > writes, and it can be incompatible with rebuilds in particular with
> > some RAID implementations.  Seagate and Toshiba have also been quietly
> > using it but not in their NAS-labeled drives and not as extensively in
> > general.
>
> I read somewhere that they knew they'd been caught and were coming clean.

Yup. WD was caught.  Then they first came out with a "you're using it
wrong" sort of defense but they did list the SMR drives.  Then they
came out with a bit more of an even-handed response.  The others
weren't caught as far as I'm aware but probably figured the writing
was on the wall since no doubt everybody and their uncle is going to
be benchmarking every drive they own.

> Another case of unbridled capitalism and consumers being hurt.

I agree.  This video has a slightly different perspective.  It doesn't
disagree on that conclusion, but it does explain more of the industry
thinking that got us here (beyond the simple/obvious it saves money):
https://www.youtube.com/watch?v=gSionmmunMs

-- 
Rich



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread antlists

On 03/05/2020 22:46, Caveman Al Toraboran wrote:

On Sunday, May 3, 2020 6:27 PM, Jack  wrote:


curious.  how do people look at --layout=n2 in the
storage industry?  e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?


You CANNOT afford to be optimistic ... Murphy's law says you will lose 
the wrong second disk.


i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).


Don't ignore that some 2-disk failures CAN'T be saved ...


e.g. a 4-disk RAID10 with -layout=n2 gives

 1*4/10 + 2*4/10 = 1.2

expected recoverable disk failures.  details are
below:


now, if we do a 5-disk --layout=n2, we get:

 1(1)2(2)3
(3)4(4)5(5)
 6(6)7(7)8
(8)9(9)10   (10)
 11   (11)   12   (12)   13
(13) ...

obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.


Don't forget a 4+spare layout, which *should* survive a 2-disk failure.


there are nchoosek(5,2) = 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:


so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.

but if we extended the 4-disk RAID10 into a
6-disk --layout=n2, we will have:

  6  nchoosek(6,2) - 3
= 1 * -  +  2 * -
   6 + nchoosek(6,2) 6 + nchoosek(6,2)

= 6/21   +  2 * 12/15

= 1.8857 expected recoverable failing disks.

almost 2.  i.e. there is 80% chance of surviving a
2 disk failure.

so, i wonder, is it a bad decision to go with an
even number disks with a RAID10?  what is the
right way to think to find an answer to this
question?

i guess the ultimate answer needs knowledge of
these:

 * F1: probability of having 1 disks fail within
   the repair window.
 * F2: probability of having 2 disks fail within
   the repair window.
 * F3: probability of having 3 disks fail within
   .   the repair window.
   .
   .
 * Fn: probability of having n disks fail within
   the repair window.

 * R1: probability of surviving 1 disks failure.
   equals 1 with all related cases.
 * R2: probability of surviving 2 disks failure.
   equals 1/3 with 5-disk RAID10
   equals 0.8 with a 6-disk RAID10.
 * R3: probability of surviving 3 disks failure.
   equals 0 with all related cases.
   .
   .
   .
 * Rn: probability of surviving n disks failure.
   equals 0 with all related cases.

 * L : expected cost of losing data on an array.
 * D : price of a disk.


Don't forget, if you have a spare disk, the repair window is the length 
of time it takes to fail-over ...


this way, the absolute expected cost when adopting
a 6-disk RAID10 is:

= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
= 6D + 0  + F2*(0.2)*L   + F3*(1-0)*L + ...

and the absolute cost for a 5-disk RAID10 is:

= 5D + F1*(1-1)*L + F2*(1-0.)*L + F3*(1-0)*L + ...
= 5D + 0  + F2*(0.6667)*L   + F3*(1-0)*L + ...

canceling identical terms, the difference cost is:

6-disk ===> 6D + 0.2*F2*L
5-disk ===> 5D + 0.6667*F2*L

from here [1] we know that a 1TB disk costs
$35.85, so:

6-disk ===> 6*35.85 + 0.2*F2*L
5-disk ===> 5*35.85 + 0.6667*F2*L

now, at which point is a 5-disk array a better
economical decision than a 6-disk one?  for
simplicity, let LOL = F2*L:

5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
LOL * (0.6667 - 0.2)<   6*35.85 - 5*35.85

 6*35.85 - 5*35.85
LOL  <   -
   0.6667 - 0.2

LOL  <   76.816
F2*L <   76.816

so, a 5-disk RAID10 is better than a 6-disk RAID10
only if:

 F2*L  <  76.816 bucks.

this site [2] says that 76% of seagate disks fail
per year (:D).  and since disks fail independent
of each other mostly, then, the probabilty of
having 2 disks fail in a year is:

76% seems incredibly high. And no, disks do not fail independently of 
each other. If you buy a bunch of identical disks, at the same time, and 
stick them all in the same raid array, the chances of them all wearing 
out at the same time are rather higher than random chance would suggest.


Which is why, if a raid disk fails, the advice is always to replace it 
asap. And if possible, to recover the failed drive to try and copy that 
rather than hammer the rest of the raid.


Bear in mind that, it doesn't matter how many drives a raid-10 has, if 
you're recovering on to a new drive, the data is stored on just two of 
the o

Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Mark Knecht
On Sun, May 3, 2020 at 1:16 PM Rich Freeman  wrote:
>
> On Sun, May 3, 2020 at 2:29 PM Mark Knecht  wrote:
> >
> > I've used the WD Reds and WD Golds (no not sold) and never had any
problem.
> >
>
> Up until a few weeks ago I would have advised the same, but WD was
> just caught shipping unadvertised SMR in WD Red disks.  This is going
> to at the very least impact your performance if you do a lot of
> writes, and it can be incompatible with rebuilds in particular with
> some RAID implementations.  Seagate and Toshiba have also been quietly
> using it but not in their NAS-labeled drives and not as extensively in
> general.

I read somewhere that they knew they'd been caught and were coming clean.
As I'm not buying anything at this time I didn't pay too much attention.

This link is at least similar to what I read earlier. Possibly it's of
interest.

https://www.extremetech.com/computing/309730-western-digital-comes-clean-shares-which-hard-drives-use-smr

Another case of unbridled capitalism and consumers being hurt.

Cheers,
Mark


Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread hitachi303

Am 03.05.2020 um 23:46 schrieb Caveman Al Toraboran:

so, in summary:

  /\
  | a 5-disk RAID10 is better than a 6-disk RAID10 |
  | ONLY IF your data is WORTH LESS than 3,524.3   |
  | bucks. |
  \/

any thoughts?  i'm a newbie.  i wonder how
industry people think?



Don't forget that having more drives increases the odds of a failing 
drive. If you have infinite drives at any given moment infinite drives 
will fail. Anyway I wouldn't know how to calculate this.


Most people are limited by money and space. Even if this isn't your 
problem you will always need an additional backup strategy. The hole 
system can fail.
I run a system with 8 drives where two can fail and they can be hot 
swoped. This is a closed source SAS which I really like except the part 
being closed source. I don't even know what kind of raid is used.


The only person I know who is running a really huge raid ( I guess 2000+ 
drives) is comfortable with some spare drives. His raid did fail an can 
fail. Data will be lost. Everything important has to be stored at a 
secondary location. But they are using the raid to store data for some 
days or weeks when a server is calculating stuff. If the raid fails they 
have to restart the program for the calculation.


Facebook used to store data which is sometimes accessed on raids. Since 
they use energy they stored data which is nearly never accessed on blue 
ray disks. I don't know if they still do. Reading is very slow if a 
mechanical arm first needs to fetch a specific blue ray out of hundreds 
and put in a disk reader but it is very energy efficient.




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Rich Freeman
On Sun, May 3, 2020 at 5:32 PM antlists  wrote:
>
> On 03/05/2020 21:07, Rich Freeman wrote:
> > I don't think you should focus so much on whether read=write in your
> > RAID.  I'd focus more on whether read and write both meet your
> > requirements.
>
> If you think about it, it's obvious that raid-1 will read faster than it
> writes - it has to write two copies while it only reads one.

Yes.  The same is true for RAID10, since it has to also write two
copies of everything.

>
> Likewise, raids 5 and 6 will be slower writing than reading - for a
> normal read it only reads the data disks, but when writing it has to
> write (and calculate!) parity as well.

Yes, but with any of the striped modes (0, 5, 6, 10) there is an
additional issue.  Writes have to generally be made in entire stripes,
so if you overwrite data in-place in units smaller than an entire
stripe, then the entire stripe needs to first be read, and then it can
be overwritten again.  This is an absolute requirement if there is
parity involved.  If there is no parity (RAID 0,10) then an
implementation might be able to overwrite part of a stripe in place
without harming the rest.

>
> A raid 1 should read data faster than a lone disk. A raid 5 or 6 should
> read noticeably faster because it's reading across more than one disk.

More-or-less.  RAID 1 is going to generally benefit from lower latency
because reads can be divided across mirrored copies (and there could
be more than one replica).  Any of the striped modes are going to be
the same as a single disk on latency, but will have much greater
bandwidth.  That bandwidth gain applies to both reading and writing,
as long as the data is sequential.

This is why it is important to understand your application.  There is
no one "best" RAID implementation.  They all have pros and cons
depending on whether you care more about latency vs bandwidth and also
read vs write.

And of course RAID isn't the only solution out there for this stuff.
Distributed filesystems also have pros and cons, and often those have
multiple modes of operation on top of this (usually somewhat mirroring
the options available for RAID but across multiple hosts).

For general storage I'm using zfs with raid1 pairs of disks (the pool
can have multiple pairs), and for my NAS for larger-scale media/etc
storage I'm using lizardfs.  I'd use ceph instead in any kind of
enterprise setup, but that is much more RAM-hungry and I'm cheap.

-- 
Rich



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Caveman Al Toraboran
On Sunday, May 3, 2020 1:23 PM, Wols Lists  wrote:

> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
> example, Seagate Barracudas are very popular desktop drives, but I guess
> maybe HALF of the emails asking for help recovering an array on the raid
> list involve them dying ...
>
> (I've got two :-( but my new system - when I get it running - has
> ironwolves instead.)

that's very scary.

just to double check:  are those help emails about
linux's software RAID?  or is it about hardware
RAIDs?

the reason i ask about software vs. hardware, is
because of this wiki article [1] which seems to
suggest that mdadm handles error recovery by
waiting for up to 30 seconds (set in
/sys/block/sd*/device/timeout) after which the
device is reset.

am i missing something?  to me it seems that [1]
seems to suggest that linux software raid has a
reliable way to handle the issue?  since i guess
all disks support resetting well?

[1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread antlists

On 03/05/2020 21:07, Rich Freeman wrote:

I don't think you should focus so much on whether read=write in your
RAID.  I'd focus more on whether read and write both meet your
requirements.


If you think about it, it's obvious that raid-1 will read faster than it 
writes - it has to write two copies while it only reads one.


Likewise, raids 5 and 6 will be slower writing than reading - for a 
normal read it only reads the data disks, but when writing it has to 
write (and calculate!) parity as well.


A raid 1 should read data faster than a lone disk. A raid 5 or 6 should 
read noticeably faster because it's reading across more than one disk.


If you're worried about write speeds, add a cache.

Cheers,
Wol



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread antlists

On 03/05/2020 18:55, Caveman Al Toraboran wrote:

On Sunday, May 3, 2020 1:23 PM, Wols Lists  wrote:


For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
example, Seagate Barracudas are very popular desktop drives, but I guess
maybe HALF of the emails asking for help recovering an array on the raid
list involve them dying ...

(I've got two :-( but my new system - when I get it running - has
ironwolves instead.)


that's very scary.

just to double check:  are those help emails about
linux's software RAID?  or is it about hardware
RAIDs?


They are about linux software raid. Hardware raid won't be any better.


the reason i ask about software vs. hardware, is
because of this wiki article [1] which seems to
suggest that mdadm handles error recovery by
waiting for up to 30 seconds (set in
/sys/block/sd*/device/timeout) after which the
device is reset.


Which if your drive does not support SCT/ERC then goes *badly* wrong.


am i missing something? 


Yes ...


to me it seems that [1]
seems to suggest that linux software raid has a
reliable way to handle the issue? 


Well, if the paragraph below were true, it would.


since i guess all disks support resetting well?


That's the point. THEY DON'T! That's why you need SCT/ERC ...


[1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID


https://raid.wiki.kernel.org/index.php/Choosing_your_hardware,_and_what_is_a_device%3F#Desktop_and_Enterprise_drives

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

Cheers,
Wol



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Rich Freeman
On Sun, May 3, 2020 at 2:29 PM Mark Knecht  wrote:
>
> I've used the WD Reds and WD Golds (no not sold) and never had any problem.
>

Up until a few weeks ago I would have advised the same, but WD was
just caught shipping unadvertised SMR in WD Red disks.  This is going
to at the very least impact your performance if you do a lot of
writes, and it can be incompatible with rebuilds in particular with
some RAID implementations.  Seagate and Toshiba have also been quietly
using it but not in their NAS-labeled drives and not as extensively in
general.

At the very least you should check the model number lists that have
been recently released to check if the drive you want to get uses SMR.
I'd also get it from someplace with a generous return policy and do
some benchmarking to confirm that the drive isn't SMR (you're probably
going to have to do continuous random writes exceeding the total
capacity of the drive before you see problems - or at least quite a
bit of random writing - the amount of writing needed will be less once
the drive has been in use for a while but a fresh drive basically acts
like close to a full-disk-sized write cache as far as SMR goes).

> Build a RAID with a WD Green and you're in for trouble. ;-)))

It really depends on your RAID implementation.  Certainly I agree that
it is better to have TLER, but for some RAID implementations not
having it just causes performance drops when you actually have errors
(which should be very rare).  For others it can cause drives to be
dropped.  I wouldn't hesitate to use greens in an mdadm or zfs array
with default options, but with something like hardware RAID I'd be
more careful.  If you use aggressive timeouts on your RAID then the
Green is more likely to get kicked out.

I agree with the general sentiment to have a spare if it will take you
a long time to replace failed drives.  Alternatively you can have
additional redundancy, or use a RAID alternative that basically treats
all free space as an effective spare (like many distributed
filesystems).

-- 
Rich



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Rich Freeman
On Sun, May 3, 2020 at 1:44 AM Caveman Al Toraboran
 wrote:
>
> * RAID 1: fails to satisfy points (1) and (3)...
> this leaves me with RAID 10

Two things:

1.  RAID 10 doesn't satisfy point 1 (read and write performance are
identical).  No RAID implementation I'm aware of does.

2.  Some RAID1 implementations can satisfy point 3 (expandability to
additional space and replication multiplicities), particular when
combined with LVM.

I'd stop and think about your requirements a bit.  You seem really
concerned about having identical read and write performance.  RAID
implementations all have their pros in cons both in comparison with
each other, in comparison with non-RAID, and in comparison between
read and write within any particular RAID implementation.

I don't think you should focus so much on whether read=write in your
RAID.  I'd focus more on whether read and write both meet your
requirements.

And on that note, what are your requirements?  You haven't mentioned
what you plan to store on it or how this data will be stored or
accessed.  It is hard to say whether any design will meet your
performance requirements when you haven't provided any, other than a
fairly arbitrary read=write one.

In general most RAID1 implementations aren't going to lag regular
non-RAID disk by much and will often exceed it (especially for
reading).  I'm not saying RAID1 is the best option for you - I'm just
suggesting that you don't toss it out just because it reads faster
than it writes, especially in favor of RAID 10 which also reads faster
than it writes but has the additional caveat that small writes may
necessitate an additional read before write.

Not knowing your requirements it is hard to make more specific
recommendations but I'd also consider ZFS and distributed filesystems.
They have some pros and cons around flexibility and if you're
operating at a small scale - it might not be appropriate for your use
case, but you should consider them.

-- 
Rich



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Mark Knecht
On Sun, May 3, 2020 at 10:56 AM Caveman Al Toraboran <
toraboracave...@protonmail.com> wrote:
>
> On Sunday, May 3, 2020 1:23 PM, Wols Lists 
wrote:
>
> > For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
> > example, Seagate Barracudas are very popular desktop drives, but I guess
> > maybe HALF of the emails asking for help recovering an array on the raid
> > list involve them dying ...
> >
> > (I've got two :-( but my new system - when I get it running - has
> > ironwolves instead.)
>
> that's very scary.
>
> just to double check:  are those help emails about
> linux's software RAID?  or is it about hardware
> RAIDs?
>
> the reason i ask about software vs. hardware, is
> because of this wiki article [1] which seems to
> suggest that mdadm handles error recovery by
> waiting for up to 30 seconds (set in
> /sys/block/sd*/device/timeout) after which the
> device is reset.
>
> am i missing something?  to me it seems that [1]
> seems to suggest that linux software raid has a
> reliable way to handle the issue?  since i guess
> all disks support resetting well?
>
> [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
>

When doing Linux RAID, hardware or software, make sure you get a RAID aware
drive that supports TLER (Time Limited Error Recovery) or whatever the
vendor that makes your drive calls it. Typically this is set at about 7
seconds guaranteeing that no mater what's going on the drive will respond
to the upper layers (mdadm) to let it know it's alive. A non-RAID drive
with no TLER feature will respond when it's ready and typically if that's
longer than 30 seconds then the RAID subsystem kicks the drive and you have
to re-add it. While there's nothing 'technically' wrong with the storage
when the RAID rebuilds you eventually hit another on of these >30 second
waits and another drive gets kicked and you're dead.

I've used the WD Reds and WD Golds (no not sold) and never had any problem.

Build a RAID with a WD Green and you're in for trouble. ;-)))

HTH,
Mark


Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Dale
Caveman Al Toraboran wrote:
> On Sunday, May 3, 2020 1:23 PM, Wols Lists  wrote:
>
>> For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
>> example, Seagate Barracudas are very popular desktop drives, but I guess
>> maybe HALF of the emails asking for help recovering an array on the raid
>> list involve them dying ...
>>
>> (I've got two :-( but my new system - when I get it running - has
>> ironwolves instead.)
> that's very scary.
>
> just to double check:  are those help emails about
> linux's software RAID?  or is it about hardware
> RAIDs?
>
> the reason i ask about software vs. hardware, is
> because of this wiki article [1] which seems to
> suggest that mdadm handles error recovery by
> waiting for up to 30 seconds (set in
> /sys/block/sd*/device/timeout) after which the
> device is reset.
>
> am i missing something?  to me it seems that [1]
> seems to suggest that linux software raid has a
> reliable way to handle the issue?  since i guess
> all disks support resetting well?
>
> [1] https://en.wikipedia.org/wiki/Error_recovery_control#Software_RAID
>
>
>


I'd like to add something about the PMR/SMR thing.  I bought a SMR drive
without knowing it.  Now when I search for a hard drive, I add NAS to
the search string.  That seems to weed out the SMR type drives.  Once I
find a exact model, I google it up to confirm.  So far, that little
trick has worked pretty well.  It may be something you want to consider
using as well.  NAS drives tend to be more robust it seems.  Given you
are using RAID, you likely want a more robust and dependable drive, if
drives can be put into that category nowadays.  :/

Hope that helps. 

Dale

:-)  :-) 


Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Jack

On 5/3/20 1:44 AM, Caveman Al Toraboran wrote:

[snip]...
 so, we get the following combinations of
 disk failures that, if happen, we won't
 lose any data:

   RAID0
   --^--
 RAID1   RAID1
 --^--   --^--
 F   .   .   .   < cases with
 .   F   .   .   < single disk
 .   .   F   .   < failures
 .   .   .   F   <

 F   .   .   F   < cases with
 .   F   F   .   < two disk
 .   F   .   F   < failures
 F   .   F   .   <
 .   F   F   .   <

 this gives us 4+5=9 possible disk failure
 scenarious where we can survive it without
 any data loss.


Minor point - you have one duplicate line there ". f  f ." which is the 
second and last line of the second group.  No effect on anything else in 
the discussion.


Trying to help thinking about odd numbers of disks, if you are still 
allowing only one disk to fail, then you can think about mirroring half 
disks, so each disk has half of it mirrored to a different disk, instead 
of drives always being mirrored in pairs.





Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Caveman Al Toraboran
On Sunday, May 3, 2020 1:14 PM, Wols Lists  wrote:

> > Q3: what are the future growth/shrinkage
> > options for a RAID10 setup? e.g. with
> > respect to these:
> >
> > 1. read/write speed.
> >
>
> iirc far is good for speed.
>
> > 2. tolerance guarantee towards failing
> >disks.
> >
>
> Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes
> you can gamble on losing more.
>
> > 3. total available space.
> >
>
> iirc you can NOT grow the far layout.

sorry, typo, i meant "near" (the command was right
though --layout=n2)




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Wols Lists
On 03/05/20 08:53, hitachi303 wrote:
> Nothing you asked but I had very bad experience with drives which spin
> down by themselves to save energy (mostly titled green or so).

Good catch!

For anything above raid 1, MAKE SURE your drives support SCT/ERC. For
example, Seagate Barracudas are very popular desktop drives, but I guess
maybe HALF of the emails asking for help recovering an array on the raid
list involve them dying ...

(I've got two :-( but my new system - when I get it running - has
ironwolves instead.)

Cheers,
Wol



Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread Wols Lists
On 03/05/20 06:44, Caveman Al Toraboran wrote:
> hi - i'm to setup my 1st RAID, and i'd appreciate
> if any of you volunteers some time to share your
> valuable experience on this subject.
> 
> my scenario
> ---
> 
> 0. i don't boot from the RAID.
> 
> 1. read is as important as write.  i don't
>have any application-specific scenario that
>makes me somehow favor one over another.
>so RAIDs that speed up the read (or write)
>while significantly harming the write (or
>read) is not welcome.
> 
> 2. replacing failed disks may take a week or
>two.  so, i guess that i may have several
>disks fail one after another in the 1-2
>weeks (specially if they were bought
>about the same time).
> 
> 3. i would like to be able to grow the RAID's
>total space (as needed), and increase its
>reliability (i.e. duplicates/partities) as
>needed.
> 
>e.g. suppose that i got a 2TB RAID that
>tolerates 1 disk failure.  i'd like to, at
>some point, to have the following options:
> 
>  * only increase the total space (e.g.
>make it 3TB), without increasing
>failure toleration (so 2 disk failure
>would result in data loss).
> 
>  * or, only increase the failure tolerance
>(e.g. such that 2 disks failure would
>not lead to data loss), without
>increasing the total space (e.g. space
>remains 2TB).
> 
>  * or, increase, both, the space and the
>failure tolerance at the same time.
> 
> 4. only interested in software RAID.
> 
> my thought
> --
> 
> i think these are not suitable:
> 
> * RAID 0: fails to satisfy point (3).
> 
> * RAID 1: fails to satisfy points (1) and (3).
> 
> * RAIDs 4 to 6: fails to satisfy point (3)
>   since they are stuck with a fixed tolerance
>   towards failing disks (i.e. RAIDs 4 and 5
>   tolerate only 1 disk failure, and RAID 6
>   tolerates only 2).
> 
> 
> this leaves me with RAID 10, with the "far"
> layout.  e.g. --layout=n2 would tolerate the
> failure of two disks, --layout=n3 three, etc.  or
> is it?  (i'm not sure).
> 
> my questions
> 
> 
> Q1: which RAID setup would you recommend?

I'd recommend having a spare in the array. That way, a single failure
would not affect redundancy at all. You can then replace the spare at
your leisure.

If you want to grow the array, I'd also suggest "raid 5 + spare". That's
probably better than 6 for writing. but 6 is better than 5 for
redundancy. Look at having a journal - that could speed up write speed
for raid 6.
> 
> Q2: how would the total number of disks in a
> RAID10 setup affect the tolerance towards
> the failing disks?
> 
Sadly, it doesn't. If you have two copies, losing two disks COULD take
out your raid.

> if the total number of disks is even, then
> it is easy to see how this is equivalent
> to the classical RAID 1+0 as shown in
> md(4), where any disk failure is tolerated
> for as long as each RAID1 group has 1 disk
> failure only.

That's a gamble ...
> 
> so, we get the following combinations of
> disk failures that, if happen, we won't
> lose any data:
> 
>   RAID0
>   --^--
> RAID1   RAID1
> --^--   --^--
> F   .   .   .   < cases with
> .   F   .   .   < single disk
> .   .   F   .   < failures
> .   .   .   F   <
> 
> F   .   .   F   < cases with
> .   F   F   .   < two disk
> .   F   .   F   < failures
> F   .   F   .   <
> .   F   F   .   <
> 
> this gives us 4+5=9 possible disk failure
> scenarious where we can survive it without
> any data loss.
> 
> but, when the number of disks is odd, then
> written bytes and their duplicates will
> start wrap around, and it is difficult for
> me to intuitively see how would this
> affect the total number of scenarious
> where i will survive a disk failure.
> 
> Q3: what are the future growth/shrinkage
> options for a RAID10 setup?  e.g. with
> respect to these:
> 
> 1. read/write speed.

iirc far is good for speed.

> 2. tolerance guarantee towards failing
>disks.

Guarantees? If you have two mirrors. the guarantee is just ONE disk. Yes
you can gamble on losing more.

> 3. total available space.

iirc you can NOT grow the far layout.
> 
> rgrds,
> cm.
> 
You have looked at the wiki - yes I know I push it regularly :-)

https://raid.wiki.kernel.org/index.php/Linux_Raid

Cheers,
Wol




Re: [gentoo-user] which linux RAID setup to choose?

2020-05-03 Thread hitachi303

Am 03.05.2020 um 07:44 schrieb Caveman Al Toraboran:

  * RAIDs 4 to 6: fails to satisfy point (3)
   since they are stuck with a fixed tolerance
   towards failing disks (i.e. RAIDs 4 and 5
   tolerate only 1 disk failure, and RAID 6
   tolerates only 2).


As far as I remember there can be spare drives / partitions which will 
replace a failed one if needed. But this does not help if drives / 
partitions fail at the same moment. Under normal conditions spares will 
rise the number of drives which can fail.


Nothing you asked but I had very bad experience with drives which spin 
down by themselves to save energy (mostly titled green or so).


Also there has been some talk about SMR
https://hardware.slashdot.org/story/20/04/19/0432229/storage-vendors-are-quietly-slipping-smr-disks-into-consumer-hard-drives