Re: Managed File Transfer
There's a DistCP utility for this kind of purpose; Also there's Spring XD there, but I am not sure if you want to use it. Regards, *Stanley Shi,* On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster that supported multiple protocols like FTPS etc. Now I am interested in evaluating a Distributed way of crawling FTP sites and downloading files using Hadoop. I thought since we have to process thousands of files Hadoop jobs can do it. Are Hadoop jobs used for this type of file transfers ? Moreover there is a requirement for a scheduler also. What is the recommendation of the forum ? Thanks, Mohan
Re: Managed File Transfer
I am a beginner. But this seems to be similar to what I intend. The data source will be external FTP or S3 storage. Spark Streaming can read data from HDFS http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html ,Flume http://flume.apache.org/, Kafka http://kafka.apache.org/, Twitter https://dev.twitter.com/ and ZeroMQ http://zeromq.org/. You can also define your own custom data sources. Thanks, Mohan On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi s...@gopivotal.com wrote: There's a DistCP utility for this kind of purpose; Also there's Spring XD there, but I am not sure if you want to use it. Regards, *Stanley Shi,* On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster that supported multiple protocols like FTPS etc. Now I am interested in evaluating a Distributed way of crawling FTP sites and downloading files using Hadoop. I thought since we have to process thousands of files Hadoop jobs can do it. Are Hadoop jobs used for this type of file transfers ? Moreover there is a requirement for a scheduler also. What is the recommendation of the forum ? Thanks, Mohan
Managed File Transfer
Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster that supported multiple protocols like FTPS etc. Now I am interested in evaluating a Distributed way of crawling FTP sites and downloading files using Hadoop. I thought since we have to process thousands of files Hadoop jobs can do it. Are Hadoop jobs used for this type of file transfers ? Moreover there is a requirement for a scheduler also. What is the recommendation of the forum ? Thanks, Mohan