Thanks for the insights into this stuff so far. I think we are doing somethings right with automating everything and such. An additional question I have is: I have heard rhetoric about zookeeper being able to help with configurations of hadoop? I was wondering if anyone is using zookeeper in a way that helps with their deployment of the hadoop cluster?
Cheers James. On 2010-04-08, at 4:18 AM, Steve Loughran wrote: > James Seigel wrote: >> I am new to this group, and relatively new to hadoop. I am looking at >> building a large cluster. I was wondering if anyone has any best practices >> for a cluster in the hundreds of nodes? As well, has anyone had experience >> with a cluster spanning multiple data centers. Is this a bad practice? >> moderately bad practice? insane? > > got some stuff here > http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment > > though my clusters are of short life span and smaller. At that kind of scale > you need to know how to manage datacenters yourself or talk to people who do > (I deny all knowledge, though I will note that in HP consulting and EDS we do > have people who can handle this) > >> Is it better to build the 1000 node cluster in a single data center? > > yes. > >> Do you back one of these things up to a second data center or a different >> 1000 node cluster? > > depends on your concerns and where the building is. > > -If your facility is in the Bay Area then you want a separate datacentre on a > different fault line. If it's in Easter WA or OR then you worry more about > volcanic activity and spec the roof to take 1-2m of volcanic ash. Power comes > off the big dams which again may go down if there's an earthquake, but > otherwise pretty reliable. > > -if your worry is about continuous availability, you need different sites > with different (multiple) power suppliers and multiple data feeds, and more > to worry about in terms of keeping things in sync. Data transfer will cost > time and money, and for a big enough cluster -1000 servers can go up to 6-12 > PB of storage, which takes time to sync. Even with the CERN LHC experiments > data rate of 1 PB/month off the LHC, it would take 6 months to get the data > in to your cluster using a good protocol like GridFTP. > > -single site would make sync easier, 10GB ethernet will still take a while > but not cost you > >> Sorry, I am asking crazy questions...I am just wanting to learn the meta >> issues and opportunities with making clusters. > > Start small, automate everything, worry about scaling up the management > problems. Hadoop filestore and JT scales well, but you have to get your ops > right. That's everything from BIOS upgrades to log file management. James Seigel ja...@tynt.com http://www.tynt.com Captain Hammer