Re: Managed File Transfer

2014-07-09 Thread Stanley Shi
There's a DistCP utility for this kind of purpose;
Also there's Spring XD there, but I am not sure if you want to use it.

Regards,
*Stanley Shi,*



On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan 
radhakrishnan.mo...@gmail.com wrote:

 Hi,
We used a commercial FT and scheduler tool in clustered mode.
 This was a traditional active-active cluster that supported multiple
 protocols like FTPS etc.

 Now I am interested in evaluating a Distributed way of crawling FTP
 sites and downloading files using Hadoop. I thought since we have to
 process thousands of files Hadoop jobs can do it.

 Are Hadoop jobs used for this type of file transfers ?

 Moreover there is a requirement for a scheduler  also. What is the
 recommendation of the forum ?


 Thanks,
 Mohan



Re: Managed File Transfer

2014-07-09 Thread Mohan Radhakrishnan
I am a beginner. But this seems to be similar to what I intend. The data
source will be external FTP or S3 storage.

Spark Streaming can read data from HDFS
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
,Flume http://flume.apache.org/, Kafka http://kafka.apache.org/, Twitter
https://dev.twitter.com/ and ZeroMQ http://zeromq.org/. You can also
define your own custom data sources.

Thanks,
Mohan


On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi s...@gopivotal.com wrote:

 There's a DistCP utility for this kind of purpose;
 Also there's Spring XD there, but I am not sure if you want to use it.

 Regards,
 *Stanley Shi,*



 On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan 
 radhakrishnan.mo...@gmail.com wrote:

 Hi,
We used a commercial FT and scheduler tool in clustered mode.
 This was a traditional active-active cluster that supported multiple
 protocols like FTPS etc.

 Now I am interested in evaluating a Distributed way of crawling FTP
 sites and downloading files using Hadoop. I thought since we have to
 process thousands of files Hadoop jobs can do it.

 Are Hadoop jobs used for this type of file transfers ?

 Moreover there is a requirement for a scheduler  also. What is the
 recommendation of the forum ?


 Thanks,
 Mohan





Managed File Transfer

2014-07-07 Thread Mohan Radhakrishnan
Hi,
   We used a commercial FT and scheduler tool in clustered mode.
This was a traditional active-active cluster that supported multiple
protocols like FTPS etc.

Now I am interested in evaluating a Distributed way of crawling FTP
sites and downloading files using Hadoop. I thought since we have to
process thousands of files Hadoop jobs can do it.

Are Hadoop jobs used for this type of file transfers ?

Moreover there is a requirement for a scheduler  also. What is the
recommendation of the forum ?


Thanks,
Mohan