On Tue, May 12, 2009 at 9:11 PM, Joydeep Sen Sarma <jssa...@facebook.com> wrote:
> (raking up real old thread)
>
> After struggling with this issue for sometime now - it seems that accessing 
> hdfs on ec2 from outside ec2 is not possible.
>
> This is primarily because of 
> https://issues.apache.org/jira/browse/HADOOP-985. Even if datanode ports are 
> authorized in ec2 and we set the public hostname via slave.host.name - the 
> namenode uses the internal IP address of the datanodes for block locations. 
> DFS clients outside ec2 cannot reach these addresses and report failure 
> reading/writing data blocks.
>
> HDFS/EC2 gurus - would it be reasonable to ask for an option to not use IP 
> addresses (and use datanode host names as pre-985)?
>
> I really like the idea of being able to use an external node (my personal 
> workstation) to do job submission (which typically requires interacting with 
> HDFS in order to push files into the jobcache etc). This way I don't need 
> custom AMIs - I can use stock hadoop amis (all the custom software is on the 
> external node). Without the above option - this is not possible currently.

You could use ssh to set up a SOCKS proxy between your machine and
ec2, and setup org.apache.hadoop.net.SocksSocketFactory to be the
socket factory.
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
has more information.

-- Philip

Reply via email to