On Sat, Oct 14, 2023 at 3:49 AM Mridul Muralidharan
wrote:
>
> A reducer oriented view of shuffle, especially without replication, could
> indeed be susceptible to this issue you described (a single fetch failure
> would require all mappers to need to be recomputed) - note, not necessarily
> all
A reducer oriented view of shuffle, especially without replication, could
indeed be susceptible to this issue you described (a single fetch failure
would require all mappers to need to be recomputed) - note, not necessarily
all reducers to be recomputed though.
Note that I have not looked much
Hi,
Spark will try to minimize the recomputation cost as much as possible.
For example, if parent stage was DETERMINATE, it simply needs to recompute
the missing (mapper) partitions (which resulted in fetch failure). Note,
this by itself could require further recomputation in the DAG if the
a) If one or more tasks for a stage (and so its shuffle id) is going to be
recomputed, if it is an INDETERMINATE stage, all shuffle output will be
discarded and it will be entirely recomputed (see here