SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

2015-09-11 Thread Varadhan, Jawahar
Hi all,   I have a coded a custom receiver which receives kafka messages. These Kafka messages have FTP server credentials in them. The receiver then opens the message and uses the ftp credentials in it  to connect to the ftp server. It then streams this huge text file (3.3G) . Finally this

Spark (1.2.0) submit fails with exception saying log directory already exists

2015-08-25 Thread Varadhan, Jawahar
Here is the error yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Log directory hdfs://Sandbox/user/spark/applicationHistory/application_1438113296105_0302 already exists!) I am using cloudera 5.3.2 with Spark 1.2.0 Any help is appreciated.

Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Varadhan, Jawahar
What is the best way to bring such a huge file from a FTP server into Hadoop to persist in HDFS? Since a single jvm process might run out of memory, I was wondering if I can use Spark or Flume to do this. Any help on this matter is appreciated.  I prefer a application/process running inside