Hi Baoxu, thanks for sharing.
2014-06-26 22:51 GMT+08:00 Baoxu Shi(Dash) b...@nd.edu:
Hi Pei-Lun,
I have the same problem there. The Issue is SPARK-2228, there also someone
posted a pull request on that, but he only eliminate this exception but not
the side effects.
I think the problem may due to the hard-coded private val
EVENT_QUEUE_CAPACITY = 1
in core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala.
There may have a chance that when the event_queue is full, the system start
dropping events, and causing key not found because those events never been
submitted.
Don’t know if that can help.
On Jun 26, 2014, at 6:41 AM, Pei-Lun Lee pl...@appier.com wrote:
Hi,
We have a long running spark application runs on spark 1.0 standalone
server and after it runs several hours the following exception shows up:
14/06/25 23:13:08 ERROR LiveListenerBus: Listener JobProgressListener
threw an exception
java.util.NoSuchElementException: key not found: 6375
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at
org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
at
org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)
at
org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
And then the web UI (driver:4040) starts showing weird results like:
(see attached screenshots)
1. negative active tasks number
2. complete stages still in active section or showing tasks incomplete
3. unpersisted rdd still in storage page and having fraction cached
100%
Eventually the application crashed but this is usually the first
exception shows up.
Any idea how to fix it?
--
Pei-Lun Lee
Screen Shot 2014-06-26 at 12.52.38 PM.pngScreen Shot 2014-06-26 at
12.52.21 PM.pngScreen Shot 2014-06-26 at 12.52.07 PM.pngScreen Shot
2014-06-26 at 12.51.15 PM.png