(I left this message in Celeborn Slack channel, but perhaps this mailing list is the right place.)

Hello,

Previously we implemented an extension of MR3 (an execution engine) to support Celeborn 0.3.1. For a short introduction, please see:
https://mr3docs.datamonad.com/docs/mr3/features/celeborn/

Now we are upgrading Celeborn to 0.5.1 and working on supporting stage rerun, much like Spark-Celeborn.

To my (pleasant) surprise, upgrading Celeborn from 0.3.1 to 0.5.1 was quite smooth. After recompiling with Celeborn 0.5.1, MR3-Celeborn just worked fine. I was surprised because the current code does not obtain Celeborn shuffle IDs at all (because there was no notion of Celeborn shuffle IDs back in 0.3.1) and we use only application shuffle IDs which are generated by MR3 (similarly to Spark shuffle IDs).

I have a few questions.

1. Suppose that a reducer fails to read the output of a certain mapper. In such a case, should we re-execute all the mappers in the previous stage? Or, is it okay to re-execute only the mapper whose output is lost? In our previous implementation, MR3-Celeborn does not fully support task rerun (similar to stage rerun) because Celeborn does not return the identity of mapper tasks whose output has been lost.

2. When a reducer tries to read the output of mappers, when is it okay to use the application shuffle ID?

3. Along the same line of question 2, should we always get Celeborn shuffle IDs when trying to read the output of mappers? Considering the fact the the current code of MR3-Celeborn works fine, it seems like this is not always necessary.

Thank you.

--- Sungwoo Park

Reply via email to