Hi Baoxu, thanks for sharing.

2014-06-26 22:51 GMT+08:00 Baoxu Shi(Dash) <b...@nd.edu>:

> Hi Pei-Lun,
>
> I have the same problem there. The Issue is SPARK-2228, there also someone
> posted a pull request on that, but he only eliminate this exception but not
> the side effects.
>
> I think the problem may due to the hard-coded   private val
> EVENT_QUEUE_CAPACITY = 10000
>
> in core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala.
> There may have a chance that when the event_queue is full, the system start
> dropping events, and causing key not found because those events never been
> submitted.
>
> Don’t know if that can help.
>
> On Jun 26, 2014, at 6:41 AM, Pei-Lun Lee <pl...@appier.com> wrote:
>
> >
> > Hi,
> >
> > We have a long running spark application runs on spark 1.0 standalone
> server and after it runs several hours the following exception shows up:
> >
> >
> > 14/06/25 23:13:08 ERROR LiveListenerBus: Listener JobProgressListener
> threw an exception
> > java.util.NoSuchElementException: key not found: 6375
> >         at scala.collection.MapLike$class.default(MapLike.scala:228)
> >         at scala.collection.AbstractMap.default(Map.scala:58)
> >         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> >         at
> org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
> >         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
> >         at
> org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)
> >         at
> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
> >         at scala.Option.foreach(Option.scala:236)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
> >         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
> >         at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
> >
> >
> > And then the web UI (driver:4040) starts showing weird results like:
> (see attached screenshots)
> > 1. negative active tasks number
> > 2. complete stages still in active section or showing tasks incomplete
> > 3. unpersisted rdd still in storage page and having fraction cached <
> 100%
> >
> > Eventually the application crashed but this is usually the first
> exception shows up.
> > Any idea how to fix it?
> >
> > --
> > Pei-Lun Lee
> >
> >
> > <Screen Shot 2014-06-26 at 12.52.38 PM.png><Screen Shot 2014-06-26 at
> 12.52.21 PM.png><Screen Shot 2014-06-26 at 12.52.07 PM.png><Screen Shot
> 2014-06-26 at 12.51.15 PM.png>
>
>

Reply via email to