On Fri, 13 Jul 2007, Bruce Allen wrote:
I have had power-related problems in the past, and have found that the lower maintenance needs and higher reliability of UPS backed systems are worth it.
There are actually two "kinds" of power distribution issues that one needs to think about for really large sites -- what goes on on the outside of the room and what goes on on the inside. On the outside it is smart to get harmonic distortion correcting transformers that can do things like protect your primary transformers from overload caused by non-PFC power supplies should any of your nodes use them, and which can often provide capacitative buffering of short outages, surge protection, and so on at the same time. On the inside there is power distribution and control -- avoidance of ground loops within racks, 120V vs 240V vs 209V supply to the racks, balancing the use of circuits on multiphase power (to minimize neutral load at the common ground ideally to building steel). UPS can be either inside or outside, but in either case should be configured with a room kill switch (both thermal and manual) to avoid cooking systems in the event of an AC failure or electrocuting firemen in the event of a fire. Truthfully, in the case of a really professional large scale cluster room I'd be inclined to put the UPS on the outside in the primary distribution system and make it easy to kill the room power. I also agree with Mark -- it is "better" to avoid UPS on compute nodes unless you really really need it and can't find any other way of smoothing over short power glitches. UPS batteries are toxic and wear out quickly. I find that it is actually quite rare for a UPS to last three whole years before you have a really significant probability that the battery won't, actually work if there is a power glitch. They're expensive, and have to be regularly tested and maintained and refitted with batteries to be reliable. They're dangerous unless you spend time and money on kill switches (see discussion of same in the list archives). I think they're a good idea in a server room with a HA cluster, where failover is key and money is on the line. I think that they are rarely worth it when one is just cranking away on compute nodes and all that one "risks" by not having them is the hassle of a reboot and a bit of lost computation time in the event of a power glitch. This DOES depend on the quality of local power, of course, but there are a variety of ways to deal with power quality issues short of a full UPS.
That's interesting. Where does the PDU store 1 second of power?
I don't know about "PDU" per se -- but units that do significant power conditioning (up to the extreme of dual isolation transformers) usually have big capacitors to buffer surges and load variations. I wouldn't have expected them to make it through a whole second unless they were designed to do so, but at this point they may be so designed. For small/personal clusters I change my mind again. I tend to buy cheap UPS's for my house because our power bobbles for 1-5 seconds nearly every heavy rain/windstorm we have, which is why I know from direct experience that the batteries in these UPSs are lucky to last two whole years. I've got something like three of them where I'm plugged into the surge side because the UPS side is dead (no power at all) or goes down anyway when power bobbles, to the tune of system crash AND the maddening beeping. I'd love to find a 10 second PDU/conditioner that used a really large capacitor instead of a battery to buffer short outages, especially at mass market prices. Does anybody know of such a beast? No battery, no toxins, just a big cap and small price? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:[EMAIL PROTECTED] _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
