Re: copy files from ftp to hdfs in parallel, distcp failed

2014-05-22 Thread Shlash
Hi 
Can help me to solve this problem please, if you solved it.
Best regards

Shlash



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-23 Thread Hao Ren

Hi,

I am just wondering whether I can move data from Ftp to Hdfs via Hadoop 
distcp.


Can someone give me an example ?

In my case, I always encounter the "can not access ftp" error.

I am quite sure that the link, login et passwd are correct, actually, I 
have just copy and paste the ftp address to Firefox. It does work. 
However,//it doesn't work with:

bin/hadoop -ls ftp://

Any workaround here ?

Thank you.

Hao

Le 16/07/2013 17:47, Hao Ren a écrit :

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the "can not access ftp" msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren > wrote:


Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name 
   
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010





io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// doesn't work as usual:
ls: Cannot access ftp://:@//: No such file or directory.

When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@// to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
www.claravista.fr  





--
Hao Ren
ClaraVista
www.claravista.fr



--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Hao Ren

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the "can not access ftp" msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren > wrote:


Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name 
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010




io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// doesn't work as usual:
ls: Cannot access ftp://:@//: No such file or directory.

When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@// to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
www.claravista.fr  





--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Ram
Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren  wrote:

>  Thank you, Ram
>
> I have configured core-site.xml as following:
>
> 
> 
>
> 
>
> 
>
> 
> hadoop.tmp.dir
> /vol/persistent-hdfs
> 
>
> 
> fs.default.name
> hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
> 
> 
>
> 
> io.file.buffer.size
> 65536
> 
>
> 
> fs.ftp.host
> 0.0.0.0
> 
>
> 
> fs.ftp.host.port
> 21
> 
>
> 
>
> Then I tried  hadoop fs -ls file:/// , it works.
> But hadoop fs -ls ftp://:@//
> doesn't work as usual:
> ls: Cannot access ftp://:@ ip>//: No such file or directory.
>
> When ignoring  as :
>
> hadoop fs -ls ftp://:@/
>
> There are no error msgs, but it lists nothing.
>
>
> I have also check the rights for my /home/ directroy:
>
> drwxr-xr-x 114096 jui 11 16:30 
>
> and all the files under /home/ have rights 755.
>
> I can easily copy the link ftp://:@ ip>// to firefox, it lists all the files as expected.
>
> Any workaround here ?
>
> Thank you.
>
> Le 12/07/2013 14:01, Ram a écrit :
>
> Please configure the following in core-ste.xml and try.
>Use hadoop fs -ls file:///  -- to display local file system files
>Use hadoop fs -ls ftp://   -- to display ftp files
> if it is listing files go for distcp.
>
>  reference from
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
>
>
>fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
> fs.ftp.host.port 21 FTP filesystem connects to fs.ftp.host on this port
>
>
>
> --
> Hao Ren
> ClaraVistawww.claravista.fr
>
>


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-15 Thread Hao Ren

Thank you, Ram

I have configured core-site.xml as following:









hadoop.tmp.dir
/vol/persistent-hdfs



fs.default.name
hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010



io.file.buffer.size
65536



fs.ftp.host
0.0.0.0



fs.ftp.host.port
21




Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://:@// 
doesn't work as usual:
ls: Cannot access ftp://:@ip>//: No such file or directory.


When ignoring  as :

hadoop fs -ls ftp://:@/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/ directroy:

drwxr-xr-x 114096 jui 11 16:30 

and all the files under /home/ have rights 755.

I can easily copy the link ftp://:@ip>// to firefox, it lists all the files as expected.


Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display ftp 
files if it is listing files go for distcp.


reference from 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on this 
port




--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Ram
Hi,
   Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display ftp files if
it is listing files go for distcp.

reference from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP
filesystem connects to fs.ftp.host on this port
and try to set the property also

reference from hadoop definitive guide hadoop file system.

Filesystem URI scheme Java implementation
Description
  (all under org.apache.hadoop)

FTP ftp fs.ftp.FTPFileSystem
A filesystem backed by an FTP server.


Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren  wrote:

> Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :
>
>> multiple copy jobs to hdfs
>>
>
> Thank you for your reply and the link.
>
> I read the link before, but I didn't find any examples about copying file
> from ftp to hdfs.
>
> There are about 20-40 file in my directory. I just want to move or copy
> that directory to hdfs on Amazon EC2.
>
> Actually, I am new to hadoop. I would like to know how to do multiple copy
> jobs to hdfs without distcp.
>
> Thank you again.
>
>
> --
> Hao Ren
> ClaraVista
> www.claravista.fr
>


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Hao Ren

Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

multiple copy jobs to hdfs


Thank you for your reply and the link.

I read the link before, but I didn't find any examples about copying 
file from ftp to hdfs.


There are about 20-40 file in my directory. I just want to move or copy 
that directory to hdfs on Amazon EC2.


Actually, I am new to hadoop. I would like to know how to do multiple 
copy jobs to hdfs without distcp.


Thank you again.

--
Hao Ren
ClaraVista
www.claravista.fr


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread பாலாஜி நாராயணன்
On 11 July 2013 06:27, Hao Ren  wrote:

> Hi,
>
> I am running a hdfs on Amazon EC2
>
> Say, I have a ftp server where stores some data.
>

I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
>
> I think hadoop distcp is what I need.
>

http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting


I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji


copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread Hao Ren

Hi,

I am running a hdfs on Amazon EC2

Say, I have a ftp server where stores some data.

I just want to copy these data directly to hdfs in a parallel way (which 
maybe more efficient).


I think hadoop distcp is what I need.

But

$ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ 
hdfs://namenode/some/path


doesn't work.

13/07/05 16:13:46 INFO tools.DistCp: 
srcPaths=[ftp://username:passwd@hostname/some/path/]

13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input 
source ftp://username:passwd@hostname/some/path/ does not exist.

at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

I checked the path by copying the ftp path in Chrome , and the file 
really exists, I can even download it.


And then, I tried to list the files under the path by:

$ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/

It ends with:

ls: Cannot access ftp://username:passwd@hostname/some/path/: No 
such file or directory.


That seems the same pb.

Any workaround here ?

Thank you in advance.

Hao.

--
Hao Ren
ClaraVista
www.claravista.fr