Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/641#discussion_r9997152
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -373,25 +375,26 @@ class DAGScheduler(
} else {
def removeStage(stageId: Int) {
// data structures based on Stage
- stageIdToStage.get(stageId).foreach { s =>
- if (running.contains(s)) {
+ for (stage <- stageIdToStage.get(stageId)) {
+ if (running.contains(stage)) {
logDebug("Removing running stage %d".format(stageId))
- running -= s
+ running -= stage
+ }
+ stageToInfos -= stage
+ for (shuffleDep <- stage.shuffleDep) {
--- End diff --
At this point, the need is to clean up the DAGScheduler's data structures
that reference the removed stage. If you are 100% certain that the stage's
notion of shuffleDeps contains every shuffleId that shuffleToMapStage is
tracking for the stage, then you can take the simpler route that you have of
working from the stage's understanding. I wasn't 100% certain of that (which
is not the same thing as saying that I have a good reason to believe that the
stage's understanding of shuffleDeps will diverge from shuffleToMapStage's
understanding), so I took the safer route of working from what
shuffleToMapStage's knows instead of from what the stage knows.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---