Re: [PERFORM] SCSI vs SATA

Ron Fri, 06 Apr 2007 05:34:03 -0700

I read them as soon as they were available. Then I shrugged andnoted YMMV to myself.

1= Those studies are valid for =those= users under =those= users'circumstances in =those= users' environments.

 How well do those circumstances and environments mimic anyone else's?

I don't know since the studies did not document said in enough detail(and it would be nigh unto impossible to do so) for me to comparemine to theirs. I =do= know that neither Google's nor a university'snor an ISP's nor a HPC supercomputing facility's NOC are particularlysimilar to say a financial institution's or a health care organization's NOC.

...and they better not be.  Ditto the personnel's behavior working them.

You yourself have said the environmental factors make a bigdifference. I agree. I submit that therefore differences in theenvironmental factors are just as significant.

2= I'll bet all the money in your pockets vs all the money in mypockets that people are going to leap at the chance to use thesestudies as yet another excuse to pinch IT spending further. In theprocess they are consciously or unconsciously going to imitate someor all of the environments that were used in those studies.Which IMHO is exactly wrong for most mission critical functions inmost non-university organizations.

While we can't all pamper our HDs to the extent that Richard Troy'sorganization can, frankly that is much closer to the way thingsshould be done for most organizations. Ditto Greg Smith's =very= good habit:"I scan all my drives for reallocated sectors, and the minute there'sa single one I get e-mailed about it and get all the data off thatdrive pronto. This has saved me from a complete failure thathappened within the next day on multiple occasions."

Amen.

I'll make the additional bet that no matter what they say neitherGoogle nor the CMU places had to deal with setting up and runningenvironments where the consequences of data loss or data corruptionare as serious as they are for most mission critical businessapplications. =Especially= DBMSs in such organizations.If anyone tried to convince me to run a mission critical orproduction DBMS in a business the way Google runs their HW, I'd beapplying the clue-by-four liberally in "boot to the head" fashionuntil either they got just how wrong they were or they convinced methey were too stupid to learn.

A which point they are never touching my machines.


3= From the CMU paper:

"We also find evidence, based on records of disk replacements inthe field, that failure rate is not constant with age, and that,rather than a significant infant mortality effect, we see asignificant early onset of wear-out degradation. That is, replacementrates in our data grew constantly with age, an effect often assumednot to set in until after a nominal lifetime of 5 years.""In our data sets, the replacement rates of SATA disks are not worsethan the replacement rates of SCSI or FC disks.=This may indicate that disk independent factors, such as operatingconditions, usage and environmental factors, affect replacement=."(emphasis mine)

If you look at the organizations in these two studies, you will notethat one thing they all have in common is that they are organizationsthat tend to push the environmental and usage envelopes. Especiallywith regards to anything involving spending money. (Google is anextreme even in that group).What these studies say clearly to me is that it is possible to bepenny-wise and pound-foolish with regards to IT spending... ...andthat these organizations have a tendency to be so.

Not a surprise to anyone who's worked in those environments I'm sure.

The last thing the IT industry needs is for everyone to copy theseorganization's IT behavior!

4= Tom Lane is of course correct that vendors burn in their HDsenough before selling them to get past most infant mortality. Thenany time any HD is shipped between organizations, it is usuallyburned in again to detect and possibly deal with issues caused byshipping. That's enough to see to it that the end operatingenvironment is not going to see a bath tub curve failure rate.Then environmental, usage, and maintenance factors further distortboth the shape and size of the statistical failure curve.

5= The major conclusion of the CMU paper is !NOT! that we should buythe cheapest HDs we can because HD quality doesn't make a difference.The important conclusion is that a very large segment of the industryoperates their equipment significantly enough outside manufacturer'sspecifications that we need a new error rate model for end use. I agree.Regardless of what Seagate et al can do in their QA labs, we needreliability numbers that are actually valid ITRW of HD usage.

The other take-away is that organizational policy and procedure withregards to HD maintenance and use in most organizations could use improving.

I strongly agree with that as well.


Cheers,
Ron Peacetree



At 01:53 AM 4/6/2007, [EMAIL PROTECTED] wrote:

On Fri, 6 Apr 2007, Ron wrote:
Bear in mind that Google was and is notorious for pushing theirenvironmental factors to the limit while using the cheapest "PoS"HW they can get their hands on.Let's just say I'm fairly sure every piece of HW they were usingfor those studies was operating outside of manufacturer's suggestedspecifications.
Ron, please go read both the studies. unless you want to say thatevery orginization the CMU picked to study also abused theirhardware as well....
Under such conditions the environmental factors are so deleteriousthat they swamp any other effect.
OTOH, I've spent my career being as careful as possible to as muchas possible run HW within manufacturer's suggested specifications.I've been chided for it over the years... ...usually by folks who"save" money by buying commodity HDs for big RAID farms in NOCs orpush their environmental envelope or push their usage envelope or... ...and then act surprised when they have so much more down timeand HW replacements than I do.
All I can tell you is that I've gotten to eat my holiday dinner farmore often than than my counterparts who push it in that fashion.
OTOH, there are crises like the Power Outage of 2003 in the NE USAwhere some places had such Bad Things happen that it simply doesn'tmatter what you bought(power dies, generator cuts in, power comes on, but AC units crash,temperatures shoot up so fast that by the time everything isre-shutdown it's in the 100F range in the NOC. Lot's 'O Stuff dieson the spot + spend next 6 months having HW failures at+considerably+ higher rates than historical norms. Ick..)
IME, it really does make a difference =if you pay attention to thedifference in the first place=.If you treat everything equally poorly, then you should not besurprised when everything acts equally poorly.
But hey, YMMV.

Cheers,
Ron Peacetree



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] SCSI vs SATA

Reply via email to