YARN worker out of disk memory

2015-06-26 Thread Tarun Garg
Hi, I am running a spark job over yarn, after 2-3 hr execution workers start dieing and i found that a lot of file at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1435184713615_0008/blockmgr-333f0ade-2474-43a6-9960-f08a15bcc7b7/3f named temp_shuffle. my job is

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-25 Thread Tarun Garg
serializable just because it gets pulled into scope more often due to the implicit conversions its contains. You should try marking the variable that holds the context with the annotation @transient. On Wed, Dec 24, 2014 at 7:04 PM, Tarun Garg bigdat...@live.com wrote: Thanks I debug

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-24 Thread Tarun Garg
Thanks for the reply. I am testing this with a small amount of data and what is happening is when ever there is data in the Kafka topic Spark does not through Exception otherwise it is. ThanksTarun Date: Wed, 24 Dec 2014 16:23:30 +0800 From: lian.cs@gmail.com To: bigdat...@live.com;

RE: Not Serializable exception when integrating SQL and Spark Streaming

2014-12-24 Thread Tarun Garg
Thanks I debug this further and below is the cause Caused by: java.io.NotSerializableException: org.apache.spark.sql.api.java.JavaSQLContext- field (class com.basic.spark.NumberCount$2, name: val$sqlContext, type: class org.apache.spark.sql.api.java.JavaSQLContext)- object

Spark Streaming is slower than Spark

2014-10-15 Thread Tarun Garg
Hi, I am evaluating Sparking Streaming with kafka and i found that spark streaming is slower than Spark. It took more time is processing same amount of data as per the Spark Console it can process 2300 Records per seconds. Is my assumption is correct? Spark Streaming has to do a lot of this

RE: Spark Cluster health check

2014-10-14 Thread Tarun Garg
after nagios.ThanksBest Regards On Tue, Oct 14, 2014 at 3:31 AM, Tarun Garg bigdat...@live.com wrote: Hi All, I am doing a POC and written a Job in java. so the architecture has kafka and spark.Now i want a process to notify me whenever system performance is getting down or in crunch

RE: Spark Cluster health check

2014-10-14 Thread Tarun Garg
at 10:16 PM, Tarun Garg bigdat...@live.com wrote: Thanks for your response, it is not about infrastructure because I am using EC2 machines and Amazon cloud watch can provide EC2 nodes cpu usage, memory usage details but I need to send notification in situation like processing delay, total delay

Spark Cluster health check

2014-10-13 Thread Tarun Garg
Hi All,I am doing a POC and written a Job in java. so the architecture has kafka and spark.Now i want a process to notify me whenever system performance is getting down or in crunch of resources, like CPU or RAM. I understand org.apache.spark.streaming.scheduler.StreamingListener, but it has