Hi, Jing

Thanks a lot for writing this FLIP, which is very useful to Batch users.
Currently  I have only two small questions:

1. First of all, please complete the fault-tolerant processing flow in the
FLIP. (Maybe you've already considered it, but it's better to explicitly
give the specific solution in the FLIP.)
For example, how to handle Source `Reader` in case of error. As far as I
know, once the reader is unavailable, it will result in the inability to
allocate a new split, which may be unacceptable in the case of speculative
execution.

2. Secondly the FLIP only says that user-defined events are not supported,
but it does not explain how to deal with the existing
ReportedWatermarkEvent/ReaderRegistrationEvent. After all, in the case of
speculative execution, there may be two "same" tasks being executed at the
same time. If these events are repeated, whether they really have no effect
on the execution of the job, there is still a clear evaluation.

Best,
Guowei


On Fri, Jun 24, 2022 at 5:41 PM Jing Zhang <beyond1...@gmail.com> wrote:

> Hi all,
> One major problem of Flink batch jobs is slow tasks running on hot/bad
> nodes, resulting in very long execution time.
>
> In order to solve this problem, FLIP-168: Speculative Execution for Batch
> Job[1] is introduced and approved recently.
>
> Here, Zhu Zhu and I propose to support speculative execution of sources as
> one of follow up of FLIP-168. You could find more details in FLIP-245[2].
> Looking forward to your feedback.
>
> Best,
> Jing Zhang
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job#FLIP168:SpeculativeExecutionforBatchJob-NointegrationwithFlink'swebUI
>
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-245%3A+Source+Supports+Speculative+Execution+For+Batch+Job
>

Reply via email to