[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046940#comment-14046940 ]
Patrick Wendell commented on SPARK-2228: ---------------------------------------- So I dug into this more and profiled it to confirm. The issue is that we do a bunch of inefficient operations in the storage listener. For instance I noticed we spend almost all the times doing a big scala groupBy on the entire list of persisted blocks: {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked <0x00000000a27ebe30> (a org.apache.spark.ui.storage.StorageListener) {code} Resizing this buffer won't help the underlying issue it all, it will just defer the time until failure to be longer. > onStageSubmitted does not properly called so NoSuchElement will be thrown in > onStageCompleted > --------------------------------------------------------------------------------------------- > > Key: SPARK-2228 > URL: https://issues.apache.org/jira/browse/SPARK-2228 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Baoxu Shi > > We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during > iterative computing, but after several hundreds of iterations, there will be > `NoSuchElementsError`. We check the code and locate the problem at > `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is > called, such `stageId` can not be found in `stageIdToPool`, but it does exist > in other HashMaps. So we think `onStageSubmitted` is not properly called. > `Spark` did add a stage but failed to send the message to listeners. When > sending `finish` message to listeners, the error occurs. > This problem will cause a huge number of `active stages` showing in > `SparkUI`, which is really annoying. But it may not affect the final result, > according to the result of my testing code. > I'm willing to help solve this problem, any idea about which part should I > change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something > to do with it but it looks fine to me. > FYI, here is the test code that could reproduce the problem. I do not know > who to put code here with highlight, so I put the code on gist to make the > issue looks clean. > https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)