[ https://issues.apache.org/jira/browse/FLINK-22132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316161#comment-17316161 ]
Arvid Heise commented on FLINK-22132: ------------------------------------- Since the task is pretty huge. I'm splitting into subtasks for defining the application and executing the application. > Test unaligned checkpoints rescaling manually on a real cluster > --------------------------------------------------------------- > > Key: FLINK-22132 > URL: https://issues.apache.org/jira/browse/FLINK-22132 > Project: Flink > Issue Type: Test > Components: Runtime / Checkpointing > Affects Versions: 1.13.0 > Reporter: Piotr Nowojski > Priority: Blocker > Fix For: 1.13.0 > > > To test unaligned checkpoints, we should use a few different applications > that use different features: > - Mixing forward/rescale channels with keyby or other shuffle operations > - Unions > - 2 or n-ary operators > - Associated state ((keyed) process function) > - Correctness verifications > The sinks should not be mocked but rather should be able to induce a fair > amount of backpressure into the system. Then, after induced failure, the user > needs to restart from a retained checkpoint with > - lower > - same > - higher degree of parallelism. > To enable unaligned checkpoints, set > - execution.checkpointing.unaligned: true > - execution.checkpointing.alignment-timeout to 0s, 10s, 1min (for high > backpressure) > The primary objective is to check if all data is recovered properly and if > the semantics is correct (does state match input?). > The secondary objective is to check if Flink UI shows the information > correctly: > - unaligned checkpoint enabled on job level > - timeout on job level > - for each checkpoint, if it's unaligned or not; how much data was written -- This message was sent by Atlassian Jira (v8.3.4#803005)