NoClassDefFound exception after setting spark.eventLog.enabled=true

2016-09-02 Thread C. Josephson
I use Spark 1.6.2 with Java, and after I set spark.eventLog.enabled=true spark crashes with this exception: Exception in thread "main" java.lang.NoClassDefFoundError: org/json4s/jackson/JsonMethods$ at org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:257)

Best way to share state in a streaming cluster

2016-08-30 Thread C. Josephson
We have a timestamped input stream and we need to share the latest processed timestamp across spark master and slaves. This will be monotonically increasing over time. What is the easiest way to share state across spark machines? An accumulator is very close to what we need, but since only the

Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson
gt; [1] > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https

Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson
> > It's called a CallSite that shows where the line comes from. You can see > the code yourself given the python file and the line number. > But that's what I don't understand. Which python file? We spark submit one file called ctr_parsing.py, but it only has 150 lines. So what is MapPartitions

Re: Batch details are missing

2016-07-11 Thread C. Josephson
The solution ended up being upgrading from Spark 1.5 to Spark 1.6.1+ On Fri, Jun 24, 2016 at 2:57 PM, C. Josephson <cjos...@uhana.io> wrote: > We're trying to resolve some performance issues with spark streaming using > the application UI, but the batch details page doesn't seem t

Batch details are missing

2016-06-24 Thread C. Josephson
We're trying to resolve some performance issues with spark streaming using the application UI, but the batch details page doesn't seem to be working. When I click on a batch in the streaming application UI, I expect to see something like this: http://i.stack.imgur.com/ApF8z.png But instead we see

Recovery techniques for Spark Streaming scheduling delay

2016-06-22 Thread C. Josephson
We have a Spark Streaming application that has basically zero scheduling delay for hours, but then suddenly it jumps up to multiple minutes and spirals out of control (see screenshot of job manager here: http://i.stack.imgur.com/kSftN.png) This is happens after a while even if we double the batch