We have a Spark Streaming application that has basically zero scheduling
delay for hours, but then suddenly it jumps up to multiple minutes and
spirals out of control (see screenshot of job manager here:
http://i.stack.imgur.com/kSftN.png)
This is happens after a while even if we double the batch
We're trying to resolve some performance issues with spark streaming using
the application UI, but the batch details page doesn't seem to be working.
When I click on a batch in the streaming application UI, I expect to see
something like this: http://i.stack.imgur.com/ApF8z.png
But instead we see
gt; [1]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https
>
> It's called a CallSite that shows where the line comes from. You can see
> the code yourself given the python file and the line number.
>
But that's what I don't understand. Which python file? We spark submit one
file called ctr_parsing.py, but it only has 150 lines. So what is
MapPartitions
The solution ended up being upgrading from Spark 1.5 to Spark 1.6.1+
On Fri, Jun 24, 2016 at 2:57 PM, C. Josephson <cjos...@uhana.io> wrote:
> We're trying to resolve some performance issues with spark streaming using
> the application UI, but the batch details page doesn't seem t
I use Spark 1.6.2 with Java, and after I set spark.eventLog.enabled=true
spark crashes with this exception:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/json4s/jackson/JsonMethods$
at
org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:257)
We have a timestamped input stream and we need to share the latest
processed timestamp across spark master and slaves. This will be
monotonically increasing over time. What is the easiest way to share state
across spark machines?
An accumulator is very close to what we need, but since only the