Re: copy files from ftp to hdfs in parallel, distcp failed

2014-05-22 Thread Shlash
Hi 
Can help me to solve this problem please, if you solved it.
Best regards

Shlash



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-23 Thread Hao Ren

Hi,

I am just wondering whether I can move data from Ftp to Hdfs via Hadoop 
distcp.


Can someone give me an example ?

In my case, I always encounter the can not access ftp error.

I am quite sure that the link, login et passwd are correct, actually, I 
have just copy and paste the ftp address to Firefox. It does work. 
However,//it doesn't work with:

bin/hadoop -ls ftp://my ftp location

Any workaround here ?

Thank you.

Hao

Le 16/07/2013 17:47, Hao Ren a écrit :

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the can not access ftp msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr 
mailto:h@claravista.fr wrote:


Thank you, Ram

I have configured core-site.xml as following:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
namehadoop.tmp.dir/name
value/vol/persistent-hdfs/value
/property

property
namefs.default.name http://fs.default.name/name
   
valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010

http://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value
/property

property
nameio.file.buffer.size/name
value65536/value
/property

property
namefs.ftp.host/name
value0.0.0.0/value
/property

property
namefs.ftp.host.port/name
value21/value
/property

/configuration

Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://login:password@ftp server
ip/directory/ doesn't work as usual:
ls: Cannot access ftp://user:password@ftp server
ip/directory/: No such file or directory.

When ignoring directroy as :

hadoop fs -ls ftp://login:password@ftp server ip/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/user directroy:

drwxr-xr-x 11 user user  4096 jui 11 16:30 user

and all the files under /home/user have rights 755.

I can easily copy the link ftp://user:password@ftp server
ip/directory/ to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://your ftp location   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
www.claravista.fr  http://www.claravista.fr





--
Hao Ren
ClaraVista
www.claravista.fr



--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Ram
Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr wrote:

  Thank you, Ram

 I have configured core-site.xml as following:

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration

 property
 namehadoop.tmp.dir/name
 value/vol/persistent-hdfs/value
 /property

 property
 namefs.default.name/name
 valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
 /value
 /property

 property
 nameio.file.buffer.size/name
 value65536/value
 /property

 property
 namefs.ftp.host/name
 value0.0.0.0/value
 /property

 property
 namefs.ftp.host.port/name
 value21/value
 /property

 /configuration

 Then I tried  hadoop fs -ls file:/// , it works.
 But hadoop fs -ls ftp://login:password@ftp server ip/directory/
 doesn't work as usual:
 ls: Cannot access ftp://user:password@ftp server
 ip/directory/: No such file or directory.

 When ignoring directroy as :

 hadoop fs -ls ftp://login:password@ftp server ip/

 There are no error msgs, but it lists nothing.


 I have also check the rights for my /home/user directroy:

 drwxr-xr-x 11 user user  4096 jui 11 16:30 user

 and all the files under /home/user have rights 755.

 I can easily copy the link ftp://user:password@ftp server
 ip/directory/ to firefox, it lists all the files as expected.

 Any workaround here ?

 Thank you.

 Le 12/07/2013 14:01, Ram a écrit :

 Please configure the following in core-ste.xml and try.
Use hadoop fs -ls file:///  -- to display local file system files
Use hadoop fs -ls ftp://your ftp location   -- to display ftp files
 if it is listing files go for distcp.

  reference from
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
 fs.ftp.host.port 21 FTP filesystem connects to fs.ftp.host on this port



 --
 Hao Ren
 ClaraVistawww.claravista.fr




Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Hao Ren

Hi,

Actually, I test with my own ftp host at first, however it doesn't work.

Then I changed it into 0.0.0.0.

But I always get the can not access ftp msg.

Thank you .

Hao.

Le 16/07/2013 17:03, Ram a écrit :

Hi,
Please replace 0.0.0.0.with your ftp host ip address and try it.

Hi,



From,
Ramesh.




On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr 
mailto:h@claravista.fr wrote:


Thank you, Ram

I have configured core-site.xml as following:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
namehadoop.tmp.dir/name
value/vol/persistent-hdfs/value
/property

property
namefs.default.name http://fs.default.name/name
valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
http://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value
/property

property
nameio.file.buffer.size/name
value65536/value
/property

property
namefs.ftp.host/name
value0.0.0.0/value
/property

property
namefs.ftp.host.port/name
value21/value
/property

/configuration

Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://login:password@ftp server
ip/directory/ doesn't work as usual:
ls: Cannot access ftp://user:password@ftp server
ip/directory/: No such file or directory.

When ignoring directroy as :

hadoop fs -ls ftp://login:password@ftp server ip/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/user directroy:

drwxr-xr-x 11 user user  4096 jui 11 16:30 user

and all the files under /home/user have rights 755.

I can easily copy the link ftp://user:password@ftp server
ip/directory/ to firefox, it lists all the files as expected.

Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://your ftp location   -- to display
ftp files if it is listing files go for distcp.

reference from

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on
this port




-- 
Hao Ren

ClaraVista
www.claravista.fr  http://www.claravista.fr





--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-15 Thread Hao Ren

Thank you, Ram

I have configured core-site.xml as following:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
namehadoop.tmp.dir/name
value/vol/persistent-hdfs/value
/property

property
namefs.default.name/name
valuehdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010/value
/property

property
nameio.file.buffer.size/name
value65536/value
/property

property
namefs.ftp.host/name
value0.0.0.0/value
/property

property
namefs.ftp.host.port/name
value21/value
/property

/configuration

Then I tried  hadoop fs -ls file:/// , it works.
But hadoop fs -ls ftp://login:password@ftp server ip/directory/ 
doesn't work as usual:
ls: Cannot access ftp://user:password@ftp server 
ip/directory/: No such file or directory.


When ignoring directroy as :

hadoop fs -ls ftp://login:password@ftp server ip/

There are no error msgs, but it lists nothing.


I have also check the rights for my /home/user directroy:

drwxr-xr-x 11 user user  4096 jui 11 16:30 user

and all the files under /home/user have rights 755.

I can easily copy the link ftp://user:password@ftp server 
ip/directory/ to firefox, it lists all the files as expected.


Any workaround here ?

Thank you.

Le 12/07/2013 14:01, Ram a écrit :

Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://your ftp location   -- to display ftp 
files if it is listing files go for distcp.


reference from 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
fs.ftp.host.port21  FTP filesystem connects to fs.ftp.host on this 
port




--
Hao Ren
ClaraVista
www.claravista.fr



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Hao Ren

Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

multiple copy jobs to hdfs


Thank you for your reply and the link.

I read the link before, but I didn't find any examples about copying 
file from ftp to hdfs.


There are about 20-40 file in my directory. I just want to move or copy 
that directory to hdfs on Amazon EC2.


Actually, I am new to hadoop. I would like to know how to do multiple 
copy jobs to hdfs without distcp.


Thank you again.

--
Hao Ren
ClaraVista
www.claravista.fr


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Ram
Hi,
   Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://your ftp location   -- to display ftp files if
it is listing files go for distcp.

reference from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP
filesystem connects to fs.ftp.host on this port
and try to set the property also

reference from hadoop definitive guide hadoop file system.

Filesystem URI scheme Java implementation
Description
  (all under org.apache.hadoop)

FTP ftp fs.ftp.FTPFileSystem
A filesystem backed by an FTP server.


Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren h@claravista.fr wrote:

 Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

 multiple copy jobs to hdfs


 Thank you for your reply and the link.

 I read the link before, but I didn't find any examples about copying file
 from ftp to hdfs.

 There are about 20-40 file in my directory. I just want to move or copy
 that directory to hdfs on Amazon EC2.

 Actually, I am new to hadoop. I would like to know how to do multiple copy
 jobs to hdfs without distcp.

 Thank you again.


 --
 Hao Ren
 ClaraVista
 www.claravista.fr



copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread Hao Ren

Hi,

I am running a hdfs on Amazon EC2

Say, I have a ftp server where stores some data.

I just want to copy these data directly to hdfs in a parallel way (which 
maybe more efficient).


I think hadoop distcp is what I need.

But

$ bin/hadoop distcp ftp://username:passwd@hostname/some/path/ 
hdfs://namenode/some/path


doesn't work.

13/07/05 16:13:46 INFO tools.DistCp: 
srcPaths=[ftp://username:passwd@hostname/some/path/]

13/07/05 16:13:46 INFO tools.DistCp: destPath=hdfs://namenode/some/path
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input 
source ftp://username:passwd@hostname/some/path/ does not exist.

at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

I checked the path by copying the ftp path in Chrome , and the file 
really exists, I can even download it.


And then, I tried to list the files under the path by:

$ bin/hadoop dfs -ls ftp://username:passwd@hostname/some/path/

It ends with:

ls: Cannot access ftp://username:passwd@hostname/some/path/: No 
such file or directory.


That seems the same pb.

Any workaround here ?

Thank you in advance.

Hao.

--
Hao Ren
ClaraVista
www.claravista.fr


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread பாலாஜி நாராயணன்
On 11 July 2013 06:27, Hao Ren h@claravista.fr wrote:

 Hi,

 I am running a hdfs on Amazon EC2

 Say, I have a ftp server where stores some data.


I just want to copy these data directly to hdfs in a parallel way (which
 maybe more efficient).

 I think hadoop distcp is what I need.


http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting


I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji