Hi Pei-Lun,

I have the same problem there. The Issue is SPARK-2228, there also someone 
posted a pull request on that, but he only eliminate this exception but not the 
side effects.

I think the problem may due to the hard-coded   private val 
EVENT_QUEUE_CAPACITY = 10000

in core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala. There 
may have a chance that when the event_queue is full, the system start dropping 
events, and causing key not found because those events never been submitted.

Don’t know if that can help.

On Jun 26, 2014, at 6:41 AM, Pei-Lun Lee <pl...@appier.com> wrote:

> 
> Hi,
> 
> We have a long running spark application runs on spark 1.0 standalone server 
> and after it runs several hours the following exception shows up:
> 
> 
> 14/06/25 23:13:08 ERROR LiveListenerBus: Listener JobProgressListener threw 
> an exception
> java.util.NoSuchElementException: key not found: 6375
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at 
> org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
>         at 
> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)
>         at 
> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
>         at scala.Option.foreach(Option.scala:236)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
>         at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
> 
> 
> And then the web UI (driver:4040) starts showing weird results like: (see 
> attached screenshots)
> 1. negative active tasks number
> 2. complete stages still in active section or showing tasks incomplete
> 3. unpersisted rdd still in storage page and having fraction cached < 100%
> 
> Eventually the application crashed but this is usually the first exception 
> shows up.
> Any idea how to fix it?
> 
> --
> Pei-Lun Lee
> 
> 
> <Screen Shot 2014-06-26 at 12.52.38 PM.png><Screen Shot 2014-06-26 at 
> 12.52.21 PM.png><Screen Shot 2014-06-26 at 12.52.07 PM.png><Screen Shot 
> 2014-06-26 at 12.51.15 PM.png>

Reply via email to