Allen Wittenauer wrote:
On 5/1/08 5:00 PM, "Bradford Stephens" <[EMAIL PROTECTED]> wrote:
*Very* cool information. As someone who's leading the transition to
open-source and cluster-orientation at a company of about 50 people,
finding good tools for the IT staff to use is essential. Thanks so much for
the continued feedback.
Hmm. I should upload my slides.
That would be excellent! I was trying not to scare people with things
like PXE preboot or the challenge of bringing up a farm of 500+ servers
when the building has just suffered a power outage. I will let your
slides do that.
The key things people have to remember are
-you can't do stuff by hand once you have more than one box; you need to
have some story for scaling things up. It could be hand creating some
machine image that is cloned, it could be using CM tools. If you find
yourself trying to ssh in to boxes to configure them by hand, you are in
trouble
-once you have enough racks in your cluster, you can abandon any notion
of 100% availability. You have to have be prepared to deal with the
failures as an everyday event. The worst failures are not the machines
that drop off the net, its the ones that start misbehaving with memory
corruption or a network card that starts flooding the network,.