Hi,

I am running a hdfs on Amazon EC2

Say, I have a ftp server where stores some data.

I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient).

I think hadoop distcp is what I need.

But

$ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ hdfs://namenode/some/path

doesn't work.

13/07/05 16:13:46 INFO tools.DistCp: srcPaths=[ftp://username:passwd@hostname/some/path/]
    13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source ftp://username:passwd@hostname/some/path/ does not exist.
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

I checked the path by copying the ftp path in Chrome , and the file really exists, I can even download it.

And then, I tried to list the files under the path by:

    $ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/

It ends with:

ls: Cannot access ftp://username:passwd@hostname/some/path/: No such file or directory.

That seems the same pb.

Any workaround here ?

Thank you in advance.

Hao.

--
Hao Ren
ClaraVista
www.claravista.fr

Reply via email to