Hi community,

I’d like to initiate a discussion regarding CIP-21: Support Flink job
recovery from JobManager failure for Apache Celeborn [1].

This proposal aims to enable Celeborn to support Flink’s batch job recovery
feature [2]. With this enhancement, Flink batch jobs using Celeborn will be
able to recover from previously completed stages after a JobManager
failure, eliminating the need to restart the entire job from scratch.

Your feedback and questions are welcome — please feel free to share any
thoughts you may have.

Best regards,
Xu Huang

[1] CIP-21: Support flink jobs recovery from JobManager failure for Apache
Celeborn. https://cwiki.apache.org/confluence/x/kw9JFg
[2] FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs.
https://cwiki.apache.org/confluence/x/QwqZE

Reply via email to