[GitHub] [flink-web] gaoyunhaii commented on a diff in pull request #545: Add blogs for FLIP-147 support checkpoints after tasks finished

GitBox Sun, 19 Jun 2022 20:57:04 -0700


gaoyunhaii commented on code in PR #545:
URL: https://github.com/apache/flink-web/pull/545#discussion_r901234152



##########
_posts/2022-06-01-final-checkpoint-part1.md:
##########
@@ -81,59 +81,61 @@ running subtasks. The states could be repartitioned on 
restarting and all the ne
 To support creating such a checkpoint for jobs with finished tasks, we 
extended the checkpoint procedure.
 Previously the checkpoint coordinator inside the JobManager first notifies all 
the sources to report snapshots,
 then all the sources further notify their descendants via broadcasting barrier 
events. Since now the sources might
-already finish, the checkpoint coordinator would instead treat the running 
tasks who do not have running precedent
-tasks as “new sources”, and notifies these tasks to initiate the checkpoints. 
The checkpoint could then deduce
-which operator is fully finished based on the task states when triggering 
checkpoint and the received snapshots. 
+have already finished, the checkpoint coordinator would instead treat the 
running tasks who also do not have running
+precedent tasks as "new sources", and it notifies these tasks to initiate the 
checkpoints. Finally, if the subtasks of
+an operator are either finished on triggering checkpoint or have finished 
processing all the data on snapshotting states,
+the operator would be marked as fully finished.
 
 The changes of the checkpoint procedure are transparent to users except that 
for checkpoints indeed containing
-finished tasks, we disallowed adding new operators before the fully finished 
ones, since it would make the fully
+finished tasks, we disallowed adding new operators as precedents of the fully 
finished ones, since it would make the fully
 finished operators have running precedents after restarting, which conflicts 
with the design that tasks finished
 in topological order.
 
 # Revise the Process of Finishing
 
 Based on the ability to take checkpoints with finished tasks, we could then 
solve the issue that two-phase-commit
-operators could not commit all the data when running in streaming mode. As the 
background, currently Flink jobs
+operators could not commit all the data when running in streaming mode. As the 
background, Flink jobs
 have two ways to finish:
 
-1.     All sources are bound and they processed all the input records. The job 
will start to finish after all the sources are finished.
-2.     Users execute `stop-with-savepoint [--drain]`. The job will take a 
savepoint and then finish. If the `–-drain` parameter is not set,
-the savepoint might be used to start new jobs and the operators will not flush 
all the event times or call methods marking all
-records processed (like `endInput()`).
+1.     All sources are bound and they processed all the input records. The job 
will finish after all the sources are finished.
+2.     Users execute `stop-with-savepoint [--drain]`. The job will take a 
savepoint and then finish. With `–-drain`, the job
+will be stopped permanently and is also required to commit all the data. 
However, without `--drain` the job might
+be resumed from the savepoint later, thus not all data are required to be 
committed, as long as the state of the data could be
+recovered from the savepoint.
 
-Ideally we should ensure exactly-once semantics in both cases.
+Let's first have a look at the case of bounded sources. To achieve end-to-end 
exactly-once,
+two-phase-commit operators only commit data after a checkpoint following this 
piece of data succeed.
+However, previously there is no such an opportunity for the data between the 
last periodic checkpoint and job finish,

Review Comment:
   Changed to `job getting finished`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] gaoyunhaii commented on a diff in pull request #545: Add blogs for FLIP-147 support checkpoints after tasks finished

Reply via email to