Hello James, I am new to this group, and relatively new to hadoop. Welcome to the group!!
I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? As well, has anyone had experience with a cluster spanning multiple data centers. Is this a bad practice? moderately bad practice? insane? You can find answers to most of the questions here - http://wiki.apache.org/hadoop/ I am not sure if there are clusters spanning in multiple data centers. Even if there are such cluster I am very confident that Hadoop will work on such cluster spanning multiple data center. Is it better to build the 1000 node cluster in a single data center? Do you back one of these things up to a second data center or a different 1000 node cluster? If you are completely new to Hadoop then it's better to start with 100-200 nodes cluster and learn how it works. Obviously later you can scale to 1000 or more nodes. Regards, Ravi -- Hadoop @ Yahoo!