[ 
https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046940#comment-14046940
 ] 

Patrick Wendell commented on SPARK-2228:
----------------------------------------

So I dug into this more and profiled it to confirm. The issue is that we do a 
bunch of inefficient operations in the storage listener. For instance I noticed 
we spend almost all the times doing a big scala groupBy on the entire list of 
persisted blocks:

{code}
        at java.lang.Integer.valueOf(Integer.java:642)
        at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70)
        at 
org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82)
        at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328)
        at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327)
        at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at 
scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327)
        at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105)
        at 
org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82)
        at 
org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56)
        at 
org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67)
        - locked <0x00000000a27ebe30> (a 
org.apache.spark.ui.storage.StorageListener)
{code}

Resizing this buffer won't help the underlying issue it all, it will just defer 
the time until failure to be longer.

> onStageSubmitted does not properly called so NoSuchElement will be thrown in 
> onStageCompleted
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2228
>                 URL: https://issues.apache.org/jira/browse/SPARK-2228
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Baoxu Shi
>
> We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during 
> iterative computing, but after several hundreds of iterations, there will be 
> `NoSuchElementsError`. We check the code and locate the problem at 
> `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is 
> called, such `stageId` can not be found in `stageIdToPool`, but it does exist 
> in other HashMaps. So we think `onStageSubmitted` is not properly called. 
> `Spark` did add a stage but failed to send the message to listeners. When 
> sending `finish` message to listeners, the error occurs. 
> This problem will cause a huge number of `active stages` showing in 
> `SparkUI`, which is really annoying. But it may not affect the final result, 
> according to the result of my testing code.
> I'm willing to help solve this problem, any idea about which part should I 
> change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something 
> to do with it but it looks fine to me.
> FYI, here is the test code that could reproduce the problem. I do not know 
> who to put code here with highlight, so I put the code on gist to make the 
> issue looks clean.
> https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to