Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/16620
  
    @kayousterhout @squito @markhamstra 
    Thanks a lot for reviewing this pr thus far. I do think the approach, which 
throws away task results from earlier attempts that were running on executors 
that failed and take `stage.pendingPartitions` as an exact mirror(in reverse) 
of the output locations for the state, can really fix this bug and make the 
code quite clear. 
    But the understanding I have previously about `stage.pendingPartitions` is 
a little bit different, as commented in `Stage` as below:
    ```
      /**
       * Partitions the [[DAGScheduler]] is waiting on before it tries to mark 
the stage / job as
       * completed and continue. Tasks' successes in both the active taskset or 
earlier attempts
       * for this stage can cause partition ids get removed from 
pendingPartitions. Finally, note
       * that when this is empty, it does not necessarily mean that stage is 
completed -- Some of
       * the map output from that stage may have been lost. But the 
[[DAGScheduler]] will check for
       * this condition and resubmit the stage if necessary.
       */
    ```
    All tasks' success can result in partition get removed `pendingPartitions`, 
no matter it is from a valid  executor or a failed one. Thus when the 
`pendingPartitions` becomes empty, we can check if the stage's output locations 
are all available, if not we resubmit. 
    
    If we take `stage.pendingPartitions` as an exact mirror(in reverse) of the 
output locations. Some unit tests can not pass in DAGSchedulerSuite(e.g. `("run 
trivial shuffle with out-of-band failure and retry"`). Think about below:
    1. A stage have ShuffleMapTask1 and ShuffleMapTask2, 
`pendingPartitions`=(0, 1)
    2. ShuffleMapTask1 succeeded on executorA and returned to driver, 
pendingPartitions=(1)
    3. ShuffleMapTask2 succeeded on executorA;
    4. Driver heard executorA is lost;
    5. ShuffleMapTask2's success returned to driver, still 
`pendingPartitions`=(1) and the stage cannot get rescheduled.
    
    In my understanding, `pendingPartitions` helps us to track running of 
`TaskSetManager` and know if there is still tasks coming on the way and deserve 
waiting, and decide when to check if the output locations are all available and 
whether to resubmit.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to