1996fanrui opened a new pull request, #19723:
URL: https://github.com/apache/flink/pull/19723

   ## What is the purpose of the change
   
   Support upstream Task switching from Aligned Checkpoint to Unaligned 
Checkpoint to improve the UC when the back pressure is severe. 
   When the barrier cannot be sent from the output buffer to the downstream 
task within the 
[execution.checkpointing.aligned-checkpoint-timeout](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-aligned-checkpoint-timeout),
 the upstream task switches to UC and takes a snapshot of the data before the 
barrier in the output buffer.
   
   ## Brief change log
   
   - *When broadcasting Barrier, if checkpointOptions.isTimeoutable(), register 
timer.*
     - *The time when the timer is triggered is: CP triggerTime + 
alignmentTimeout*
     - *Behavior when the timer is triggered: snapshot all buffers that have 
not been sent to the downstream before the Barrier*
   - *If the UC is enabled and the Aligned Barrier is received, the CP cannot 
succeed immediately and needs to wait for the outputBufferFuture:*
     - *When all barriers are sent downstream quickly, and an empty output 
buffer is returned, the CP ends.(when the back pressure isn't severe)*
     - *Or wait for the timer to trigger, snapshot these output buffers before 
the barrier, and the CP ends.(when the back pressure is severe)*
     - *If the CP is aborted, flink will execute 
the`channelStateFuture.completeExceptionally(cause)`*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - Finish unit test later.
     - This PR will affect the Unaligned checkpoint benchmark. We can view the 
benefit after merge 
[here](http://codespeed.dak8s.net:8000/timeline/#/?exe=1,6&ben=checkpointSingleInput.UNALIGNED_1&env=2&revs=200&equid=off&quarts=on&extr=on).
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? docs, it need to update some doc 
of UC.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to