Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21698 Actually I have been thinking about the issue and feel we can still go further with the current patch to fix the issue with child stages @mridulm mentioned above, that requires track all the preceding stages that can generate non-deterministic output for each active/finished stage, and also track the stages that have partially finished (some tasks in the stage have finished, while the rest still running or pending). In case of FetchFailure(or ExecutorLost/WorkerLost), if the affected stage can generate non-deterministic output, then clear all the shuffle files from that stage (to make sure weâll retry all tasks for that stage), and also check all partially finished stages that are succeeding stages of the affected stage, also clear all the shuffle files of them (If a stage is succeeding stage of the affected stage but it has already finished successfully and no shuffle files are lost, then itâs fine we donât need to retry it). The above fix proposal requires more code refactoring of DAGScheduler, and it shall consume some memories to store additional informations (assume you have M active/finished stages, and N stages may generate non-deterministic output, then it shall take at most M * N Int values to store the stageIds). Normally N is by far smaller than the total number of stages, so I think the extra cost shall be acceptable. Actually, you always have to pay a lot of extra effort on FetchFailure, this may increase the effort by several times, but shall be statistically fine considering most Spark jobs are short-running and don't hit FetchFailure quite often (The major advantage of this approach is that you don't pay for any penalty if you don't hit FetchFailure).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org