[
https://issues.apache.org/jira/browse/SPARK-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-19538.
----------------------------------
Resolution: Incomplete
> DAGScheduler and TaskSetManager can have an inconsistent view of whether a
> stage is complete.
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-19538
> URL: https://issues.apache.org/jira/browse/SPARK-19538
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 2.1.0
> Reporter: Kay Ousterhout
> Assignee: Kay Ousterhout
> Priority: Minor
> Labels: bulk-closed
>
> The pendingPartitions in Stage tracks partitions that still need to be
> computed, and is used by the DAGScheduler to determine when to mark the stage
> as complete. In most cases, this variable is exactly consistent with the
> tasks in the TaskSetManager (for the current version of the stage) that are
> still pending. However, as discussed in SPARK-19263, these can become
> inconsistent when an ShuffleMapTask for an earlier attempt of the stage
> completes, in which case the DAGScheduler may think the stage has finished,
> while the TaskSetManager is still waiting for some tasks to complete (see the
> description in this pull request:
> https://github.com/apache/spark/pull/16620). This leads to bugs like
> SPARK-19263. Another problem with this behavior is that listeners can get
> two StageCompleted messages: once when the DAGScheduler thinks the stage is
> complete, and a second when the TaskSetManager later decides the stage is
> complete. We should fix this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]