I have tried using *slave.host.name* and give it the public address of my data node. I can now see the node listed with its public address on the dfshealth.jsp, however when I try to send a file to the HDFS from my external server I still get :
*08/09/08 15:58:41 INFO dfs.DFSClient: Waiting to find target node: 10.251.75.177:50010 08/09/08 15:59:50 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException 08/09/08 15:59:50 INFO dfs.DFSClient: Abandoning block blk_-8257572465338588575 08/09/08 15:59:50 INFO dfs.DFSClient: Waiting to find target node: 10.251.75.177:50010 08/09/08 15:59:56 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2246) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842) 08/09/08 15:59:56 WARN dfs.DFSClient: Error Recovery for block blk_-8257572465338588575 bad datanode[0]* Is there another parameter I could specify to force the address of my datanode? I have been searching on the EC2 forums and documentation and apparently there is no way I can use *dfs.datanode.dns.interface* or * dfs.datanode.dns.nameserver* to specify the public IP of my instance. Has anyone else managed to send/retrieve stuff from HDFS on an EC2 cluster from an external machine? Thanks Julien 2008/9/5 Julien Nioche <[EMAIL PROTECTED]> > Hi guys, > > I am using Hadoop on a EC2 cluster and am trying to send files onto the > HDFS from an external machine. It works up to the point where I get this > error message : > *Waiting to find target node: 10.250.7.148:50010* > > I've seen a discussion about a similar issue on * > http://thread.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/2446/focus=2449 > * but there are no details on how to fix the problem. > > Any idea about how I can set up my EC2 instances so that they return their > public IPs and not the internal Amazon ones? Anything I can specify for the > parameters *dfs.datanode.dns.interface* and *dfs.datanode.dns.nameserver*? > > > What I am trying to do is to put my input to be processed onto the HDFS and > retrieve the output from there. What I am not entirely sure of is whether I > can launch my job from the external machine. Most people seem to SSH to the > master to do that. > > Thanks > > Julien > -- > DigitalPebble Ltd > http://www.digitalpebble.com > -- DigitalPebble Ltd http://www.digitalpebble.com