Hi,
I am running a spark job over yarn, after 2-3 hr execution workers start
dieing and i found that a lot of file at
/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1435184713615_0008/blockmgr-333f0ade-2474-43a6-9960-f08a15bcc7b7/3f
named temp_shuffle.
my job is
serializable just because it
gets pulled into scope more often due to the implicit conversions its contains.
You should try marking the variable that holds the context with the annotation
@transient.
On Wed, Dec 24, 2014 at 7:04 PM, Tarun Garg bigdat...@live.com wrote:
Thanks
I debug
Thanks for the reply.
I am testing this with a small amount of data and what is happening is when
ever there is data in the Kafka topic Spark does not through Exception
otherwise it is.
ThanksTarun
Date: Wed, 24 Dec 2014 16:23:30 +0800
From: lian.cs@gmail.com
To: bigdat...@live.com;
Thanks
I debug this further and below is the cause
Caused by: java.io.NotSerializableException:
org.apache.spark.sql.api.java.JavaSQLContext- field (class
com.basic.spark.NumberCount$2, name: val$sqlContext, type: class
org.apache.spark.sql.api.java.JavaSQLContext)- object
Hi,
I am evaluating Sparking Streaming with kafka and i found that spark streaming
is slower than Spark. It took more time is processing same amount of data as
per the Spark Console it can process 2300 Records per seconds.
Is my assumption is correct? Spark Streaming has to do a lot of this
after
nagios.ThanksBest Regards
On Tue, Oct 14, 2014 at 3:31 AM, Tarun Garg bigdat...@live.com wrote:
Hi All,
I am doing a POC and written a Job in java. so the architecture has kafka and
spark.Now i want a process to notify me whenever system performance is getting
down or in crunch
at 10:16 PM, Tarun Garg bigdat...@live.com wrote:
Thanks for your response, it is not about infrastructure because I am using EC2
machines and Amazon cloud watch can provide EC2 nodes cpu usage, memory usage
details but I need to send notification in situation like processing delay,
total delay
Hi All,I am doing a POC and written a Job in java. so the architecture has
kafka and spark.Now i want a process to notify me whenever system performance
is getting down or in crunch of resources, like CPU or RAM. I understand
org.apache.spark.streaming.scheduler.StreamingListener, but it has