I want to copy HDFS filese over HTTP using distcp, but I can't. It is a problem of configuration that I can't find it. How can I do distcp in Hadoop 2.0.4 over HTTP?
First I set up hadoop 2.0.4 over http - Httpfs - on port 3888, which is running. Here is the proof: $ curl -i http://zk1.host.com:3888?user.name=babu&op=homedir [1] 32129 [myuser@zk1 hadoop]$ HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Accept-Ranges: bytes ETag: W/"674-1365802990000" Last-Modified: Fri, 12 Apr 2013 21:43:10 GMT Content-Type: text/html Content-Length: 674 Date: Sat, 01 Jun 2013 15:48:04 GMT <?xml version="1.0" encoding="UTF-8"?> <html> <body> <b>HttpFs service</b>, service base URL at /webhdfs/v1. </body> </html> But, when I do distcp, I can't copy: $ hadoop distcp http://zk1.host:3888/gutenberg/a.txt http://zk1.host:3888/ Warning: $HADOOP_HOME is deprecated. Copy failed: java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1434) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1455) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:635) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) $ hadoop distcp httpfs://zk1.host:3888/gutenberg/a.txt httpfs://zk1.host:3888/ Copy failed: java.io.IOException: No FileSystem for scheme: httpfs at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1434) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1455) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:635) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) $ hadoop distcp hdfs://zk1.host3888/gutenberg/a.txt hdfs://zk1.host:3888/ Copy failed: java.io.IOException: Call to zk1.host/127.0.0.1:3888<http://zk1.yrl.gq1.yahoo.com/98.137.30.10:3888>failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144) at org.apache.hadoop.ipc.Client.call(Client.java:1112) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Here is my core-site files and httpfs-env.sh where I configured HDFS and the HTTPFS: $ cat etc/hadoop/core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://zk1.host:9000</value> </property> <property<name>hadoop.proxyuser.myuser.hosts</name<value>zk1.host</value> </property> <property> <name>hadoop.proxyuser.myuser.groups</name> <value>*</value> </property> </configuration> $ cat etc/hadoop/httpfs-env.sh #!/bin/bash export HTTPFS_HTTP_PORT=3888 export HTTPFS_HTTP_HOSTNAME=`hostname -f` -- Best regards,