On 06/07/2011 06:07 AM, sanjeev.ta...@us.pwc.com wrote:
Hello,

I wanted to know if anyone has any tips or tutorials on howto install the
hadoop cluster on multiple datacenters

Nobody has come out and said they've built a single HDFS filesystem from multiple sites, primarly because the inter-site bandwidth/latency will be awful and there isn't any support for this in the topology model of Hadoop (there are some placeholders though).

You could set up an HDFS filesystem in each datacentre, and use symbolic links (or the forthcoming federation) to pull data in. There's no reason why you can't start up a job on Datacentre-1 that starts reading some of its data from DC-2, after which all the work will be datacentre-local.

Do you need ssh connectivity between the nodes across these data centers?

Depends on how you deploy Hadoop. You only need SSH if you use the built-in tooling; if you use large scale cluster management tools then it's a non-issue.

Reply via email to