[ https://issues.apache.org/jira/browse/SPARK-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-24011: ------------------------------------ Assignee: (was: Apache Spark) > Cache rdd's immediate parent ShuffleDependencies to accelerate > getShuffleDependencies() > --------------------------------------------------------------------------------------- > > Key: SPARK-24011 > URL: https://issues.apache.org/jira/browse/SPARK-24011 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: wuyi > Priority: Minor > > When creating stages for jobs, we need to find a rdd's (except the final rdd) > immediate parent ShuffleDependencies by method getShuffleDependencies() for > at least 2 times (first in > getMissingAncestorShuffleDependencies(), and second in > getOrCreateParentStages()). > So, we can cache the result at the fist time we call getShuffleDependencies(). > This is helpful for cutting time consuming when there's many > NarrowDependencies between the rdd and its immediate parent > ShuffleDependencies or if the rdd has a number of immediate parent > ShuffleDependencies . > > There's an exception for checkpointed rdd. If a rdd is checkpointed, it's > immediate parent ShuffleDependencies should adjust to empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org