Hello James,

I am new to this group, and relatively new to hadoop.
Welcome to the group!!

I am looking at building a large cluster.  I was wondering if anyone has any 
best practices for a cluster in the hundreds of nodes?  As well, has anyone had 
experience with a cluster spanning multiple data centers.  Is this a bad 
practice? moderately bad practice?  insane?

You can find answers to most of the questions here - 
http://wiki.apache.org/hadoop/
I am not sure if there are clusters spanning in multiple data centers. Even if 
there are such cluster I am very confident that Hadoop will work on such 
cluster spanning multiple data center.

Is it better to build the 1000 node cluster in a single data center?  Do you 
back one of these things up to a second data center or a different 1000 node 
cluster?

If you are completely new to Hadoop then it's better to start with 100-200 
nodes cluster and learn how it works. Obviously later you can scale to 1000 or 
more nodes.

Regards,
Ravi
--
Hadoop @ Yahoo!

Reply via email to