Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

Sungwoo Park Sat, 14 Oct 2023 00:30:52 -0700

a) If one or more tasks for a stage (and so its shuffle id) is going to be
recomputed, if it is an INDETERMINATE stage, all shuffle output will be
discarded and it will be entirely recomputed (see here
<https://github.com/apache/spark/blob/3e2470de7ea8b97dcdd8875ef25f044998fb7588/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1477>
).

If a reducer (in a downstream stage) fails to read data, can we find outwhich tasks should recompute their output? From the previous discussion, Ithought this was hard (in the current implementation), and we shouldre-execute all tasks in the upstream stage.


Thanks,

--- Sungwoo

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

Reply via email to