Hi, Jing Thanks a lot for writing this FLIP, which is very useful to Batch users. Currently I have only two small questions:
1. First of all, please complete the fault-tolerant processing flow in the FLIP. (Maybe you've already considered it, but it's better to explicitly give the specific solution in the FLIP.) For example, how to handle Source `Reader` in case of error. As far as I know, once the reader is unavailable, it will result in the inability to allocate a new split, which may be unacceptable in the case of speculative execution. 2. Secondly the FLIP only says that user-defined events are not supported, but it does not explain how to deal with the existing ReportedWatermarkEvent/ReaderRegistrationEvent. After all, in the case of speculative execution, there may be two "same" tasks being executed at the same time. If these events are repeated, whether they really have no effect on the execution of the job, there is still a clear evaluation. Best, Guowei On Fri, Jun 24, 2022 at 5:41 PM Jing Zhang <beyond1...@gmail.com> wrote: > Hi all, > One major problem of Flink batch jobs is slow tasks running on hot/bad > nodes, resulting in very long execution time. > > In order to solve this problem, FLIP-168: Speculative Execution for Batch > Job[1] is introduced and approved recently. > > Here, Zhu Zhu and I propose to support speculative execution of sources as > one of follow up of FLIP-168. You could find more details in FLIP-245[2]. > Looking forward to your feedback. > > Best, > Jing Zhang > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job#FLIP168:SpeculativeExecutionforBatchJob-NointegrationwithFlink'swebUI > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-245%3A+Source+Supports+Speculative+Execution+For+Batch+Job >