NoClassDefFound exception after setting spark.eventLog.enabled=true
I use Spark 1.6.2 with Java, and after I set spark.eventLog.enabled=true spark crashes with this exception: Exception in thread "main" java.lang.NoClassDefFoundError: org/json4s/jackson/JsonMethods$ at org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:257) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:124) at org.apache.spark.SparkContext.(SparkContext.scala:519) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at com.uhana.stream.UhanaStreamingContext.(UhanaStreamingContext.java:165) at com.uhana.stream.UhanaStreamingContext.(UhanaStreamingContext.java:23) at com.uhana.stream.UhanaStreamingContext$Builder.build(UhanaStreamingContext.java:159) at com.uhana.stream.app.StreamingMain.main(StreamingMain.java:41) Caused by: java.lang.ClassNotFoundException: org.json4s.jackson.JsonMethods$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 8 more I downloaded some jars for json4s, but I'm not sure where they should go. Thanks, -cjosephson
Best way to share state in a streaming cluster
We have a timestamped input stream and we need to share the latest processed timestamp across spark master and slaves. This will be monotonically increasing over time. What is the easiest way to share state across spark machines? An accumulator is very close to what we need, but since only the driver program can read the accumulator’s value, it won't work. Any suggestions? Thanks, -C
Re: Understanding Spark UI DAGs
Ok, so those line numbers in our DAG don't refer to our code. Is there any way to display (or calculate) line numbers that refer to code we actually wrote, or is that only possible in Scala Spark? On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > My little understanding of Python-Spark bridge is that at some point > the python code communicates over the wire with Spark's backbone that > includes PythonRDD [1]. > > When the CallSite can't be computed, it's null:-1 to denote "nothing > could be referred to". > > [1] > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson <cjos...@uhana.io> wrote: > >> It's called a CallSite that shows where the line comes from. You can see > >> the code yourself given the python file and the line number. > > > > > > But that's what I don't understand. Which python file? We spark submit > one > > file called ctr_parsing.py, but it only has 150 lines. So what is > > MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py > imports a > > number of support functions we wrote, but how do we know which python > file > > to look at? > > > > Furthermore, what on earth is null:-1 referring to? > -- Colleen Josephson Engineering Researcher Uhana, Inc.
Re: Understanding Spark UI DAGs
> > It's called a CallSite that shows where the line comes from. You can see > the code yourself given the python file and the line number. > But that's what I don't understand. Which python file? We spark submit one file called ctr_parsing.py, but it only has 150 lines. So what is MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a number of support functions we wrote, but how do we know which python file to look at? Furthermore, what on earth is null:-1 referring to?
Re: Batch details are missing
The solution ended up being upgrading from Spark 1.5 to Spark 1.6.1+ On Fri, Jun 24, 2016 at 2:57 PM, C. Josephson <cjos...@uhana.io> wrote: > We're trying to resolve some performance issues with spark streaming using > the application UI, but the batch details page doesn't seem to be working. > When I click on a batch in the streaming application UI, I expect to see > something like this: http://i.stack.imgur.com/ApF8z.png > > But instead we see this: > [image: Inline image 1] > > Any ideas why we aren't getting any job details? We are running pySpark > 1.5.0. > > Thanks, > -cjoseph > -- Colleen Josephson Engineering Researcher Uhana, Inc.
Batch details are missing
We're trying to resolve some performance issues with spark streaming using the application UI, but the batch details page doesn't seem to be working. When I click on a batch in the streaming application UI, I expect to see something like this: http://i.stack.imgur.com/ApF8z.png But instead we see this: [image: Inline image 1] Any ideas why we aren't getting any job details? We are running pySpark 1.5.0. Thanks, -cjoseph
Recovery techniques for Spark Streaming scheduling delay
We have a Spark Streaming application that has basically zero scheduling delay for hours, but then suddenly it jumps up to multiple minutes and spirals out of control (see screenshot of job manager here: http://i.stack.imgur.com/kSftN.png) This is happens after a while even if we double the batch interval. We are not sure what causes the delay to happen (theories include garbage collection). The cluster has generally low CPU utilization regardless of whether we use 3, 5 or 10 slaves. We are really reluctant to further increase the batch interval, since the delay is zero for such long periods. Are there any techniques to improve recovery time from a sudden spike in scheduling delay? We've tried seeing if it will recover on its own, but it takes hours if it even recovers at all Thanks, -cjoseph