Tom White wrote:
Hi Joydeep,
The problem you are hitting may be because port 50001 isn't open,
whereas from within the cluster any node may talk to any other node
(because the security groups are set up to do this).
However I'm not sure this is a good approach. Configuring Hadoop to
use public IP addresses everywhere should work, but you have to pay
for all data transfer between nodes (see http://aws.amazon.com/ec2/,
"Public and Elastic IP Data Transfer"). This is going to get expensive
fast!
So to get this to work well, we would have to make changes to Hadoop
so it was aware of both public and private addresses, and use the
appropriate one: clients would use the public address, while daemons
would use the private address. I haven't looked at what it would take
to do this or how invasive it would be.
I thought that AWS had stopped you being able to talk to things within
the cluster using the public IP addresses -stopped you using DynDNS as
your way of bootstrapping discovery
Here's what may work
-bring up the EC2 cluster using the local names
-open up the ports
-have the clients talk using the public IP addresses
the problem will arise when the namenode checks the fs name used and it
doesnt match its expectations -there were some recent patches in the
code to handle this when someone talks to the namenode using the
ipaddress instead of the hostname; they may work for this situation too.
personally, I wouldn't trust the NN in the EC2 datacentres to be secure
to external callers, but that problem already exists within their
datacentres anyway