Thanks for the insights into this stuff so far.  I think we are doing 
somethings right with automating everything and such.  An additional question I 
have is: I have heard rhetoric about zookeeper being able to help with 
configurations of hadoop?  I was wondering if anyone is using zookeeper in a 
way that helps with their deployment of the hadoop cluster?

Cheers
James.


On 2010-04-08, at 4:18 AM, Steve Loughran wrote:

> James Seigel wrote:
>> I am new to this group, and relatively new to hadoop. I am looking at 
>> building a large cluster.  I was wondering if anyone has any best practices 
>> for a cluster in the hundreds of nodes?  As well, has anyone had experience 
>> with a cluster spanning multiple data centers.  Is this a bad practice? 
>> moderately bad practice?  insane?
> 
> got some stuff here
> http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment
> 
> though my clusters are of short life span and smaller. At that kind of scale 
> you need to know how to manage datacenters yourself or talk to people who do 
> (I deny all knowledge, though I will note that in HP consulting and EDS we do 
> have people who can handle this)
> 
>> Is it better to build the 1000 node cluster in a single data center?  
> 
> yes.
> 
>> Do you back one of these things up to a second data center or a different 
>> 1000 node cluster?
> 
> depends on your concerns and where the building is.
> 
> -If your facility is in the Bay Area then you want a separate datacentre on a 
> different fault line. If it's in Easter WA or OR then you worry more about 
> volcanic activity and spec the roof to take 1-2m of volcanic ash. Power comes 
> off the big dams which again may go down if there's an earthquake, but 
> otherwise pretty reliable.
> 
> -if your worry is about continuous availability, you need different sites 
> with different (multiple) power suppliers and multiple data feeds, and more 
> to worry about in terms of keeping things in sync. Data transfer will cost 
> time and money, and for a big enough cluster -1000 servers can go up to 6-12 
> PB of storage, which takes time to sync. Even with the CERN LHC experiments 
> data rate of 1 PB/month off the LHC, it would take 6 months to get the data 
> in to your cluster using a good protocol like GridFTP.
> 
> -single site would make sync easier, 10GB ethernet will still take a while 
> but not cost you
> 
>> Sorry, I am asking crazy questions...I am just wanting to learn the meta 
>> issues and opportunities with making clusters.
> 
> Start small, automate everything, worry about scaling up the management 
> problems. Hadoop filestore and JT scales well, but you have to get your ops 
> right. That's everything from BIOS upgrades to log file management.

James Seigel
ja...@tynt.com
http://www.tynt.com
Captain Hammer

Reply via email to