Zakelly Lan created FLINK-36118:
-----------------------------------

             Summary: FLIP-455: Declare async state processing and checkpoint 
the in-flight requests
                 Key: FLINK-36118
                 URL: https://issues.apache.org/jira/browse/FLINK-36118
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing, Runtime / State Backends
            Reporter: Zakelly Lan
            Assignee: Zakelly Lan


The FLIP: https://cwiki.apache.org/confluence/x/C4owEg

FLIP-423 introduced the disaggregated state management and the FLIP-425 
introduced the new execution model of asynchronous state access in an 
event-driven way. This model has the potential to significantly boost 
performance by leveraging parallel I/O operations. However, it does lead to 
increased draining times during checkpoints, presenting a trade-off between 
system throughput and checkpoint synchronization delay. This balance can be 
calibrated through adjusting the buffer size. As a follow-up FLIP for FLIP-425, 
this FLIP proposes a faster way of checkpoint by snapshot state requests that 
are waiting in the buffer of "Asynchronous Execution Controller (AEC)" as part 
of the checkpoint. By this approach, we expect only a great optimization for 
the draining time overhead compared with the original plan in FLIP-425, 
especially under a high back-pressure scenario. To achieve the snapshot of 
state requests, the callbacks from user should be persisted across job 
attempts. This FLIP introduces a novel approach for declaring element 
processing where all callbacks are re-declared and bound to the corresponding 
previous state requests during the operator's initialization phase. This 
ensures that the entire pipeline can be accurately restored and operations can 
resume smoothly after a job restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to