useful information indeed, though a bit complicated for my level I must say I think it is more than useful to post these online, say maybe in Hadoop's wiki or as an article on cluster resource sites.. How about it? I can volunteer for this if you wish, a central information place on the hadoop wiki for pre-install clusters admin? - OS image install - ssh setup - dsh ant tools setup - rpm automation - this.next( ? )
2008/5/2 Steve Loughran <[EMAIL PROTECTED]>: > Allen Wittenauer wrote: > > > On 5/1/08 5:00 PM, "Bradford Stephens" <[EMAIL PROTECTED]> > > wrote: > > > > > *Very* cool information. As someone who's leading the transition to > > > open-source and cluster-orientation at a company of about 50 people, > > > finding good tools for the IT staff to use is essential. Thanks so > > > much for > > > the continued feedback. > > > > > > > Hmm. I should upload my slides. > > > > > > > That would be excellent! I was trying not to scare people with things like > PXE preboot or the challenge of bringing up a farm of 500+ servers when the > building has just suffered a power outage. I will let your slides do that. > > The key things people have to remember are > -you can't do stuff by hand once you have more than one box; you need to > have some story for scaling things up. It could be hand creating some > machine image that is cloned, it could be using CM tools. If you find > yourself trying to ssh in to boxes to configure them by hand, you are in > trouble > > -once you have enough racks in your cluster, you can abandon any notion of > 100% availability. You have to have be prepared to deal with the failures as > an everyday event. The worst failures are not the machines that drop off the > net, its the ones that start misbehaving with memory corruption or a network > card that starts flooding the network,. > > --