[ https://issues.apache.org/jira/browse/SPARK-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or updated SPARK-1538: ----------------------------- Summary: SparkUI forgets about all persisted RDD's not directly associated with the Stage (was: SparkUI forgets about all persisted RDD's not directly associated with stages) > SparkUI forgets about all persisted RDD's not directly associated with the > Stage > -------------------------------------------------------------------------------- > > Key: SPARK-1538 > URL: https://issues.apache.org/jira/browse/SPARK-1538 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.1 > Reporter: Andrew Or > Priority: Blocker > Fix For: 1.0.0 > > > The following command creates two RDDs in one Stage: > sc.parallelize(1 to 1000, 4).persist.map(_ + 1).count > More specifically, parallelize creates one, and map creates another. If we > persist only the first one, it does not actually show up on the StorageTab of > the SparkUI. > This is because StageInfo only keeps around information for the last RDD > associated with the stage, but forgets about all of its parents. The proposal > here is to have StageInfo climb the RDD dependency ladder to keep a list of > all associated RDDInfos. -- This message was sent by Atlassian JIRA (v6.2#6252)