On Wed, Mar 18, 2020 at 9:49 AM antlists <antli...@youngman.org.uk> wrote:
>
> On 17/03/2020 14:29, Grant Edwards wrote:
> > On 2020-03-17, Neil Bothwick <n...@digimed.co.uk> wrote:
> >
> >> Same here. The main advantage of spinning HDs are that they are cheaper
> >> to replace when they fail. I only use them when I need lots of space.
> >
> > Me too. If I didn't have my desktop set up as a DVR with 5TB of
> > recording space, I wouldn't have any spinning drives at all.  My
> > personal experience so far indicates that SSDs are far more reliable
> > and long-lived than spinning HDs.  I would guess that about half of my
> > spinning HDs fail in under 5 years.  But then again, I tend to buy
> > pretty cheap models.
> >
> If you rely on raid, and use spinning rust, DON'T buy cheap drives. I
> like Seagate, and bought myself Barracudas. Big mistake. Next time
> round, I bought Ironwolves. Hopefully that system will soon be up and
> running, and I'll see whether that was a good choice :-)

Can you elaborate on what the mistake was?  Backblaze hasn't found
Seagate to really be any better/worse than anything else.  It seems
like every vendor has a really bad model every couple of years.  Maybe
the more expensive drive will last longer, but you're paying a hefty
premium.  It might be cheaper to just get three drives with 3x
redundancy than two super-expensive ones with 2x redundancy.

The main issues I've seen with RAID are:

1. Double failures.  If your RAID doesn't accommodate double failures
(RAID6/etc) then you have to consider the time required to replace a
drive and rebuild the array.  As arrays get large or if you aren't
super-quick with replacements then you have more risk of double
failures.  Maybe you could mitigate that with drives that are less
likely to fail at the same time, but I suspect you're better off
having enough redundancy to deal with the problem.

2.  Drive fails and the system becomes unstable/etc.  This is usually
a controller problem, and is probably less likely for better
controllers.  It could also be a kernel issue if the
driver/failesystem/etc doesn't handle the erroneous data.  I think the
only place you could impact this risk is with the controller, not the
drive.  If the drive sends garbage over the interface then the
controller should not pass along invalid data or allow that to
interface with functioning drives.

This is one of the reasons that I've been trying to move towards
lizardfs or other distributed filesystems.  This puts the redundancy
at the host level.  I can lose all the drives on a host, the host, its
controller, its power supply, or whatever, and nothing bad happens.
Typically in these systems drives aren't explicitly paired but data is
just generally pooled, so if data is lost the entire cluster starts
replicating it to return redundancy, and that rebuild gets split
across all hosts and starts instantly and not after you add a drive
unless you were running near-full.  One host replicating one 12TB
drive takes a lot longer than 10 hosts replicating 1.2TB each to
another host in parallel as long as your network switches can run at
full network capacity per host at the same time and you have no
bottlenecks.

-- 
Rich

Reply via email to