Re: [gentoo-user] hp H222 SAS controller

Alan McKinnon Tue, 09 Jul 2013 15:32:47 -0700

On 08/07/2013 20:27, Stefan G. Weichinger wrote:
> Am 08.07.2013 17:58, schrieb Alan McKinnon:
>> On 08/07/2013 17:39, Paul Hartman wrote:
>>> On Thu, Jul 4, 2013 at 9:04 PM, Paul Hartman
>>> <paul.hartman+gen...@gmail.com> wrote:
>>>> ST4000DM000
>>>
>>> As a side-note these two Seagate 4TB "Desktop" edition drives I bought
>>> already, after about than 100 hours of power-on usage, both drives
>>> have each encountered dozens of unreadable sectors so far. I was able
>>> to correct them (force reallocation) using hdparm... So it should be
>>> "fixed", and I'm reading that this is "normal" with newer drives and
>>> "don't worry about it", but I'm still coming from the time when 1 bad
>>> sector = red alert, replace the drive ASAP.  I guess I will need to
>>> monitor and see if it gets worse.
>>>
>>
>>
>> Way back when in the bad old days of drives measured in 100s of megs,
>> you'd get a few bad sectors now and then, and would have to mark them as
>> faulty. This didn't bother us then much
>>
>> Nowadays we have drives that are 8,000 bigger than that so all other
>> things being equal we'd expect sectors to fail 8,000 time more (more
>> being a very fuzzy concept, and I know full well I'm using it loosely :-) )
>>
>> Our drives nowadays also have smart firmware, something we had to
>> introduce when CHS no longer cut it, this lead to sector failures being
>> somewhat "invisible" leaving us with the happy delusion that drives were
>> vastly reliable etc etc etc. But you know all this.
>>
>> A mere few dozen failures in the first 100 hours is a failure rate of
>> (Alan whips out the trust sci calculator) 4.8E-6%. Pretty damn
>> spectacular if you ask me and WELL within probabilities.
>>
>> There is likely nothing wrong with your drives. If they are faulty, it's
>> highly likely a systemic manufacturing fault of the mechanicals (servo
>> systems, motor bearing etc)
>>
>> You do realize that modern hard drives have for the longest time been up
>> there in the Top X list of Most Reliable Devices Made By Mankind Ever?
> 
> Does it make sense to apply some sort of burn-in-procedure before
> actually formatting and using the disks? Running badblocks or something?
> 
> I ask because I wait for that shiny new server and doing so might not
> hurt before installing gentoo. Or is that too paranoid and a waste of time?


If it makes you feel better, then by all means go through the motions
.

For my money, I reckon that's exactly what it is - motions and ritual. I
havew any anecdotal evidence to back it up, but it's fairly strong
anecdotal evidence:

Over the last 5 years, the team I'm in, the teams we work closely with
and the Storage guys have commissioned >1000 pieces of hardware and
probably more than 4000 drives, the vast majority from Dell. I have no
idea what burn-in Dell applies, if any. We've had our fair share of
infant mortality failures, prob ably less than 20 in 5 years. And here's
the kicker - every single one failed in production.

Most of that hardware, and ALL of the SANs, went through heavy
pre-deployment testing. Usually, this means cloning the -dev system onto
it and running the crap out of it for a decent length of time. Once the
techies were happy, install the production version and switch it on.

I conclude that the likely reason we only found failure in prod is that
only prod gives a decent viable test that approximates real life and dev
is always a mere simulation. It's not usage that kills a few drives
early, it's the almost random pattern of disk access that you get in
real life. That tends to shake out the weak links better than any test.

However, this is all anecdotal so use or discard as you see fit :-). I
no longer worry about data loss as we have 4 hour warranty turnaround
SLAs in place and company policy is to only deploy storage that is
guaranteed to survive loss of any one drive in an array.


-- 
Alan McKinnon
alan.mckin...@gmail.com

Re: [gentoo-user] hp H222 SAS controller

Reply via email to