Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-17 Thread Steve Loughran
luster. From: Marcelo Vanzin mailto:van...@cloudera.com>> To: "Varadhan, Jawahar" mailto:varad...@yahoo.com>> Cc: "d...@spark.apache.org<mailto:d...@spark.apache.org>" mailto:d...@spark.apache.org>> Sent: Friday, August 14, 2015 3:23 PM Subject: Re

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Marcelo Vanzin
On Fri, Aug 14, 2015 at 2:11 PM, Varadhan, Jawahar < varad...@yahoo.com.invalid> wrote: > And hence, I was planning to use Spark Streaming with Kafka or Flume with > Kafka. But flume runs on a JVM and may not be the best option as the huge > file will create memory issues. Please suggest someway t

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Varadhan, Jawahar
: "Varadhan, Jawahar" Cc: "d...@spark.apache.org" Sent: Friday, August 14, 2015 3:23 PM Subject: Re: Setting up Spark/flume/? to Ingest 10TB from FTP Why do you need to use Spark or Flume for this? You can just use curl and hdfs:   curl ftp://blah | hdfs dfs -put - /bl