Hi, devs & users

Very sorry for the spoiled formats, I resent the discussion as follows.

As discussed in FLIP-131[1], Flink will make DataStream the unified API for 
processing bounded and unbounded data in both streaming and blocking modes. 
However, one long-standing problem for the streaming mode is that currently 
Flink does not s​upport checkpoints after some tasks finished, which causes 
some problems for bounded or mixed jobs:
        1. Flink exactly-once sinks rely on checkpoints to ensure data won’t be 
replayed before committed to external systems in streaming mode. If sources are 
bounded and checkpoints are disabled after some tasks are finished, the data 
sent after the last checkpoint would always not be able to be committed. This 
issue has already been reported some times in the user ML[2][3][4] and is 
future brought up when working on FLIP-143: Unified Sink API [5]. 
        2. The jobs with both bounded and unbounded sources might have to 
replay a large amount of records after failover due to no periodic checkpoints 
are taken after the bounded sources finished.

Therefore, we propose to also support checkpoints after some tasks finished. 
Your Could find more details in FLIP-147[6]. 

Best,
Yun

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[2] 
https://lists.apache.org/thread.html/rea1ac2d82f646fcea1395b5738be495f144c5b0312a290a1d4a339c1%40%3Cuser.flink.apache.org%3E
[3] 
https://lists.apache.org/thread.html/rad4adeec838093b8b56ae9e2ea6a937a4b1882b53045a12acb7e61ea%40%3Cuser.flink.apache.org%3E
[4] 
https://lists.apache.org/thread.html/4cf28a9fa3732dfdd9e673da6233c5288ca80b20d58cee130bf1c141%40%3Cuser.flink.apache.org%3E
[5] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
[6] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished
 ------------------Original Mail ------------------
Sender:Yun Gao <yungao...@aliyun.com.INVALID>
Send Date:Fri Oct 9 14:16:52 2020
Recipients:Flink Dev <dev@flink.apache.org>, User-Flink <u...@flink.apache.org>
Subject:[DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi, devs & users

As discussed in FLIP-131 [1], Flink will make DataStream the unified API for 
processing bounded and unbounded data in both streaming and blocking modes. 
However, one long-standing problem for the streaming mode is that currently 
Flink does not support checkpoints after some tasks finished, which causes some 
problems for bounded or mixed jobs:
Flink exactly-once sinks rely on checkpoints to ensure data won’t be replayed 
before committed to external systems in streaming mode. If sources are bounded 
and checkpoints are disabled after some tasks are finished, the data sent after 
the last checkpoint would always not be able to be committed. This issue has 
already been reported some times in the user ML[2][3][4] and is future brought 
up when working on FLIP-143: Unified Sink API [5]. 
The jobs with both bounded and unbounded sources might have to replay a large 
amount of records after failover due to no periodic checkpoints are taken after 
the bounded sources finished.
Therefore, we propose to also support checkpoints after some tasks finished. 
Your Could find more details in FLIP-147[6]. 
Best,
Yun

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
[2] 
https://lists.apache.org/thread.html/rea1ac2d82f646fcea1395b5738be495f144c5b0312a290a1d4a339c1%40%3Cuser.flink.apache.org%3E
[3] 
https://lists.apache.org/thread.html/rad4adeec838093b8b56ae9e2ea6a937a4b1882b53045a12acb7e61ea%40%3Cuser.flink.apache.org%3E
[4] 
https://lists.apache.org/thread.html/4cf28a9fa3732dfdd9e673da6233c5288ca80b20d58cee130bf1c141%40%3Cuser.flink.apache.org%3E
[5] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
[6] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished

Reply via email to