(raking up real old thread)

After struggling with this issue for sometime now - it seems that accessing 
hdfs on ec2 from outside ec2 is not possible.

This is primarily because of https://issues.apache.org/jira/browse/HADOOP-985. 
Even if datanode ports are authorized in ec2 and we set the public hostname via 
slave.host.name - the namenode uses the internal IP address of the datanodes 
for block locations. DFS clients outside ec2 cannot reach these addresses and 
report failure reading/writing data blocks.

HDFS/EC2 gurus - would it be reasonable to ask for an option to not use IP 
addresses (and use datanode host names as pre-985)? 

I really like the idea of being able to use an external node (my personal 
workstation) to do job submission (which typically requires interacting with 
HDFS in order to push files into the jobcache etc). This way I don't need 
custom AMIs - I can use stock hadoop amis (all the custom software is on the 
external node). Without the above option - this is not possible currently.

 

-----Original Message-----
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] 
Sent: Tuesday, September 09, 2008 7:04 AM
To: core-user@hadoop.apache.org
Subject: Re: public IP for datanode on EC2

> I think most people try to avoid allowing remote access for security 
> reasons. If you can add a file, I can mount your filesystem too, maybe 
> even delete things. Whereas with EC2-only filesystems, your files are 
> *only* exposed to everyone else that knows or can scan for your IPAddr and 
> ports.
>

I imagine that the access to the ports used by HDFS could be restricted to 
specific IPs using the EC2 group (ec2-authorize) or any other firewall 
mechanism if necessary.

Could anyone confirm that there is no conf parameter I could use to force the 
address of my DataNodes?

Thanks

Julien

--
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to