[ 
https://issues.apache.org/jira/browse/FLINK-24881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-24881.
----------------------------------
    Resolution: Not A Bug

Hi, sorry for a late response, I have just stumbled across this ticket. The 
issue you are describing is just a plain old problem with checkpointing under 
backpressure with aligned checkpoints. This is more or less correct behaviour 
for the aligned checkpoints and there are only three things that you can do:
* get rid of the backpressure
* use unaligned checkpoints (maybe with a small timeout value)
* use buffer debloating

Ideally all of the above combined. This has been described in the docs for 
quite some time:
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpointing_under_backpressure/

> When the Source is back pressured, the checkpoint interval may not take effect
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-24881
>                 URL: https://issues.apache.org/jira/browse/FLINK-24881
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core
>    Affects Versions: 1.14.0, 1.13.3
>            Reporter: Zongwen Li
>            Priority: Major
>         Attachments: image-2021-11-12-11-21-15-910.png
>
>
> Checkpoint config:
>  * EXACTLY_ONCE
>  * aligned
>  * interval: 10s
>  * min-pause: 10s
>  * max-attempts: 2
> When Source was back pressured for a long time, I found that multiple 
> checkpoints were triggered at the same time, which made the configuration 
> support parallel checkpoint and checkpoint interval unable to achieve the 
> target effect;
> And I found that there is usually a checkpoint that will fail at this time, 
> but this failure will not cause the job to restart.
> !image-2021-11-12-11-21-15-910.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to