Re: LiveListenerBus throws exception and weird web UI bug

2014-07-21 Thread mrm
I have the same error! Did you manage to fix it?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LiveListenerBus-throws-exception-and-weird-web-UI-bug-tp8330p10324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: LiveListenerBus throws exception and weird web UI bug

2014-07-21 Thread Andrew Or
Hi all,

This error happens because we receive a completed event for a particular
stage that we don't know about, i.e. a stage we haven't received a
submitted event for. The root cause of this, as Baoxu explained, is
usually because the event queue is full and the listener begins to drop
events. In this case we are dropping the submitted event. This particular
exception should be fixed in the latest master, as we now check for whether
the key exists before indexing directly into it. Unfortunately, this is not
in Spark 1.0.1, but will be fixed in Spark 1.1. There is currently no
bullet-proof workaround for this issue, but you might try to reduce the
number of concurrently running tasks (partitions) to avoid emitting too
many events. The root cause of the listener queue taking too much time to
process events is recorded in SPARK-2316, which we also intend to fix by
Spark 1.1.

Andrew


2014-07-21 10:23 GMT-07:00 mrm ma...@skimlinks.com:

 I have the same error! Did you manage to fix it?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/LiveListenerBus-throws-exception-and-weird-web-UI-bug-tp8330p10324.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



答复: LiveListenerBus throws exception and weird web UI bug

2014-07-21 Thread 余根茂(木艮)
Hi all, 

 Here is my fix https://github.com/apache/spark/pull/1356, although not 
handsome, but work well.  Any Suggestions?

 

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LiveListenerBus-throws-exception-and-weird-web-UI-bug-tp8330p10324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

 



Re: LiveListenerBus throws exception and weird web UI bug

2014-06-26 Thread Pei-Lun Lee
Hi Baoxu, thanks for sharing.


2014-06-26 22:51 GMT+08:00 Baoxu Shi(Dash) b...@nd.edu:

 Hi Pei-Lun,

 I have the same problem there. The Issue is SPARK-2228, there also someone
 posted a pull request on that, but he only eliminate this exception but not
 the side effects.

 I think the problem may due to the hard-coded   private val
 EVENT_QUEUE_CAPACITY = 1

 in core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala.
 There may have a chance that when the event_queue is full, the system start
 dropping events, and causing key not found because those events never been
 submitted.

 Don’t know if that can help.

 On Jun 26, 2014, at 6:41 AM, Pei-Lun Lee pl...@appier.com wrote:

 
  Hi,
 
  We have a long running spark application runs on spark 1.0 standalone
 server and after it runs several hours the following exception shows up:
 
 
  14/06/25 23:13:08 ERROR LiveListenerBus: Listener JobProgressListener
 threw an exception
  java.util.NoSuchElementException: key not found: 6375
  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:58)
  at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
  at
 org.apache.spark.ui.jobs.JobProgressListener.onStageCompleted(JobProgressListener.scala:78)
  at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
  at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$2.apply(SparkListenerBus.scala:48)
  at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
  at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
  at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at
 org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
  at
 org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:48)
  at
 org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
  at scala.Option.foreach(Option.scala:236)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
  at
 org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
  at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
 
 
  And then the web UI (driver:4040) starts showing weird results like:
 (see attached screenshots)
  1. negative active tasks number
  2. complete stages still in active section or showing tasks incomplete
  3. unpersisted rdd still in storage page and having fraction cached 
 100%
 
  Eventually the application crashed but this is usually the first
 exception shows up.
  Any idea how to fix it?
 
  --
  Pei-Lun Lee
 
 
  Screen Shot 2014-06-26 at 12.52.38 PM.pngScreen Shot 2014-06-26 at
 12.52.21 PM.pngScreen Shot 2014-06-26 at 12.52.07 PM.pngScreen Shot
 2014-06-26 at 12.51.15 PM.png