On 11 July 2013 06:27, Hao Ren <h....@claravista.fr> wrote:

> Hi,
>
> I am running a hdfs on Amazon EC2
>
> Say, I have a ftp server where stores some data.
>

I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
>
> I think hadoop distcp is what I need.
>

http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting


I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji

Reply via email to