Hi Derek, The "http" in "http://core:7274/logs/log.20090121" should be "hftp". hftp is the scheme name of HftpFileSystem which uses http for accessing hdfs.
Hope this helps. Nicholas Sze ----- Original Message ---- > From: Derek Young <dyo...@kayak.com> > To: core-user@hadoop.apache.org > Sent: Wednesday, January 21, 2009 1:23:56 PM > Subject: using distcp for http source files > > I plan to use hadoop to do some log processing and I'm working on a method to > load the files (probably nightly) into hdfs. My plan is to have a web server > on > each machine with logs that serves up the log directories. Then I would give > distcp a list of http URLs of the log files and have it copy the files in. > > Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like this > should be supported, but the http URLs are not working for me. Are http > source > URLs still supported? > > I tried a simple test with an http source URL (using Hadoop 0.19): > > hadoop distcp -f http://core:7274/logs/log.20090121 /user/dyoung/mylogs > > This fails: > > With failures, global counters are inaccurate; consider running with -i > Copy failed: java.io.IOException: No FileSystem for scheme: http > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1364) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) > at org.apache.hadoop.tools.DistCp.fetchFileList(DistCp.java:578) > at org.apache.hadoop.tools.DistCp.access$300(DistCp.java:74) > at org.apache.hadoop.tools.DistCp$Arguments.valueOf(DistCp.java:775) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:844) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)