[ https://issues.apache.org/jira/browse/SPARK-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout updated SPARK-19538: ----------------------------------- Priority: Minor (was: Major) > DAGScheduler and TaskSetManager can have an inconsistent view of whether a > stage is complete. > --------------------------------------------------------------------------------------------- > > Key: SPARK-19538 > URL: https://issues.apache.org/jira/browse/SPARK-19538 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 2.1.0 > Reporter: Kay Ousterhout > Assignee: Kay Ousterhout > Priority: Minor > > The pendingPartitions in Stage tracks partitions that still need to be > computed, and is used by the DAGScheduler to determine when to mark the stage > as complete. In most cases, this variable is exactly consistent with the > tasks in the TaskSetManager (for the current version of the stage) that are > still pending. However, as discussed in SPARK-19263, these can become > inconsistent when an ShuffleMapTask for an earlier attempt of the stage > completes, in which case the DAGScheduler may think the stage has finished, > while the TaskSetManager is still waiting for some tasks to complete (see the > description in this pull request: > https://github.com/apache/spark/pull/16620). This leads to bugs like > SPARK-19263. Another problem with this behavior is that listeners can get > two StageCompleted messages: once when the DAGScheduler thinks the stage is > complete, and a second when the TaskSetManager later decides the stage is > complete. We should fix this. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org