On 06/07/2011 06:07 AM, sanjeev.ta...@us.pwc.com wrote:
Hello,
I wanted to know if anyone has any tips or tutorials on howto install the
hadoop cluster on multiple datacenters
Nobody has come out and said they've built a single HDFS filesystem from
multiple sites, primarly because the inter-site bandwidth/latency will
be awful and there isn't any support for this in the topology model of
Hadoop (there are some placeholders though).
You could set up an HDFS filesystem in each datacentre, and use symbolic
links (or the forthcoming federation) to pull data in. There's no reason
why you can't start up a job on Datacentre-1 that starts reading some of
its data from DC-2, after which all the work will be datacentre-local.
Do you need ssh connectivity between the nodes across these data centers?
Depends on how you deploy Hadoop. You only need SSH if you use the
built-in tooling; if you use large scale cluster management tools then
it's a non-issue.