I hadn't looked at -217 since, well, I was designing spaceflight hardware... This is a very nice set of references; I'm especially fond of perusing the Weibull data. I'd not looked at klabs before.

I'll echo the 2x factoring with 10 deg temperature rise. And, I hear, al the time, from bean counters and room monitors, how we should run our machine rooms hotter. I've got 2 with ambient setpoints at 80F right now, and we see, in our 300 node cluster, an average of one DIMM and one hard drive/week. It's a real good thing the hardware's all still under maintenance, else we'd be out of systems already. Over the winter, when building thermal sink was lower, we also saw fewer failures.

gerry

Lux, Jim (337C) wrote:
Try this
http://rel.intersil.com/docs/rel/calculation_of_semiconductor_failure_rates.pdf

You might also look for MIL-HDBK-217

Of course, a paper by H.S. Blanks makes the following statement:
Although the temperature dependence of failure rate can be very high, in most 
situations it is much less than that of the Arrhenius acceleration factor. It 
is very improbable that the temperature dependence of component failure rate 
can be meaningfully modelled for reliability prediction purposes or for the 
purpose of optimizing thermal design component layout.
(from abstract for "Arrhenius and the temperature dependence of non-constant failure 
rate" Quality and Reliability Engineering International, Vol 6, #4, pp259-265, 20 
Mar 2007)

You might also browse around http://www.weibull.com/ or http://www.klabs.org/

Jim


From: [email protected] [mailto:[email protected]] On 
Behalf Of Jon Tegner
Sent: Wednesday, April 14, 2010 1:12 AM
To: Mark Hahn
Cc: [email protected]
Subject: Re: Re: [Beowulf] 96 cores in silent and small enclosure


the max temp spec is not some arbitrary knob that the chip vendors
choose out of spiteful anti-green-ness. I wouldn't be surprised to see some

****************************************************************

Issue is not the temp spec of current cpus, problem is that it is hard to get 
relevant information. I haven't found any that states that the failure rate in 
year 5 should be significantly higher if you operate the cpu at 65 C instead of 
55 C. I'm just saying this kind of information would be valuable (and I would 
be glad to find it).


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to