On Fri, Aug 14, 2015 at 2:11 PM, Varadhan, Jawahar <
varad...@yahoo.com.invalid> wrote:

> And hence, I was planning to use Spark Streaming with Kafka or Flume with
> Kafka. But flume runs on a JVM and may not be the best option as the huge
> file will create memory issues. Please suggest someway to run it inside the
> cluster.
>

I'm not sure why you think memory would be a problem. You don't need to
read all 10GB into memory to transfer the file.

I'm far from the best person to give advice about Flume, but this seems
like it would be a job more in line with what Sqoop does; although a quick
search seems to indicate Sqoop cannot yet read from FTP.

But writing your own code to read from an FTP server when a message arrives
from Kafka shouldn't really be hard.

-- 
Marcelo

Reply via email to