I hadn't looked at -217 since, well, I was designing spaceflight
hardware... This is a very nice set of references; I'm especially fond
of perusing the Weibull data. I'd not looked at klabs before.
I'll echo the 2x factoring with 10 deg temperature rise. And, I hear, al
the time, from bean counters and room monitors, how we should run our
machine rooms hotter. I've got 2 with ambient setpoints at 80F right
now, and we see, in our 300 node cluster, an average of one DIMM and one
hard drive/week. It's a real good thing the hardware's all still under
maintenance, else we'd be out of systems already. Over the winter, when
building thermal sink was lower, we also saw fewer failures.
gerry
Lux, Jim (337C) wrote:
Try this
http://rel.intersil.com/docs/rel/calculation_of_semiconductor_failure_rates.pdf
You might also look for MIL-HDBK-217
Of course, a paper by H.S. Blanks makes the following statement:
Although the temperature dependence of failure rate can be very high, in most
situations it is much less than that of the Arrhenius acceleration factor. It
is very improbable that the temperature dependence of component failure rate
can be meaningfully modelled for reliability prediction purposes or for the
purpose of optimizing thermal design component layout.
(from abstract for "Arrhenius and the temperature dependence of non-constant failure
rate" Quality and Reliability Engineering International, Vol 6, #4, pp259-265, 20
Mar 2007)
You might also browse around http://www.weibull.com/ or http://www.klabs.org/
Jim
From: [email protected] [mailto:[email protected]] On
Behalf Of Jon Tegner
Sent: Wednesday, April 14, 2010 1:12 AM
To: Mark Hahn
Cc: [email protected]
Subject: Re: Re: [Beowulf] 96 cores in silent and small enclosure
the max temp spec is not some arbitrary knob that the chip vendors
choose out of spiteful anti-green-ness. I wouldn't be surprised to see some
****************************************************************
Issue is not the temp spec of current cpus, problem is that it is hard to get
relevant information. I haven't found any that states that the failure rate in
year 5 should be significantly higher if you operate the cpu at 65 C instead of
55 C. I'm just saying this kind of information would be valuable (and I would
be glad to find it).
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf