Zakelly Lan created FLINK-36118:
-----------------------------------
Summary: FLIP-455: Declare async state processing and checkpoint
the in-flight requests
Key: FLINK-36118
URL: https://issues.apache.org/jira/browse/FLINK-36118
Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing, Runtime / State Backends
Reporter: Zakelly Lan
Assignee: Zakelly Lan
The FLIP: https://cwiki.apache.org/confluence/x/C4owEg
FLIP-423 introduced the disaggregated state management and the FLIP-425
introduced the new execution model of asynchronous state access in an
event-driven way. This model has the potential to significantly boost
performance by leveraging parallel I/O operations. However, it does lead to
increased draining times during checkpoints, presenting a trade-off between
system throughput and checkpoint synchronization delay. This balance can be
calibrated through adjusting the buffer size. As a follow-up FLIP for FLIP-425,
this FLIP proposes a faster way of checkpoint by snapshot state requests that
are waiting in the buffer of "Asynchronous Execution Controller (AEC)" as part
of the checkpoint. By this approach, we expect only a great optimization for
the draining time overhead compared with the original plan in FLIP-425,
especially under a high back-pressure scenario. To achieve the snapshot of
state requests, the callbacks from user should be persisted across job
attempts. This FLIP introduces a novel approach for declaring element
processing where all callbacks are re-declared and bound to the corresponding
previous state requests during the operator's initialization phase. This
ensures that the entire pipeline can be accurately restored and operations can
resume smoothly after a job restart.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)