Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21131#discussion_r183619704
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -689,6 +689,20 @@ private[spark] class TaskSchedulerImpl(
         }
       }
     
    +  /**
    +   * Marks the task has completed in all TaskSetManagers for the given 
stage.
    +   *
    +   * After stage failure and retry, there may be multiple active 
TaskSetManagers for the stage.
    +   * If an earlier attempt of a stage completes a task, we should ensure 
that the later attempts
    +   * do not also submit those same tasks.  That also means that a task 
completion from an  earlier
    +   * attempt can lead to the entire stage getting marked as successful.
    +   */
    +  private[scheduler] def markPartitionCompletedInAllTaskSets(stageId: Int, 
partitionId: Int) = {
    +    taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { 
tsm =>
    --- End diff --
    
    Generally, it seems impossible for a unfinished `TaskSet` to get an empty 
`Map()` in `taskSetsByStageIdAndAttempt` .  But, if it does, maybe, we can tell 
the caller the stage has already finished.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to