(I left this message in Celeborn Slack channel, but perhaps this mailing
list is the right place.)
Hello,
Previously we implemented an extension of MR3 (an execution engine) to
support Celeborn 0.3.1. For a short introduction, please see:
https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
Now we are upgrading Celeborn to 0.5.1 and working on supporting stage
rerun, much like Spark-Celeborn.
To my (pleasant) surprise, upgrading Celeborn from 0.3.1 to 0.5.1 was
quite smooth. After recompiling with Celeborn 0.5.1, MR3-Celeborn just
worked fine. I was surprised because the current code does not obtain
Celeborn shuffle IDs at all (because there was no notion of Celeborn
shuffle IDs back in 0.3.1) and we use only application shuffle IDs which
are generated by MR3 (similarly to Spark shuffle IDs).
I have a few questions.
1. Suppose that a reducer fails to read the output of a certain mapper. In
such a case, should we re-execute all the mappers in the previous stage?
Or, is it okay to re-execute only the mapper whose output is lost?
In our previous implementation, MR3-Celeborn does not fully support task
rerun (similar to stage rerun) because Celeborn does not return the
identity of mapper tasks whose output has been lost.
2. When a reducer tries to read the output of mappers, when is it okay to
use the application shuffle ID?
3. Along the same line of question 2, should we always get Celeborn
shuffle IDs when trying to read the output of mappers? Considering the
fact the the current code of MR3-Celeborn works fine, it seems like this
is not always necessary.
Thank you.
--- Sungwoo Park