[ 
https://issues.apache.org/jira/browse/FLINK-22132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-22132:
--------------------------------
    Description: 
To test unaligned checkpoints, we should use a few different applications that 
use different features:
- Mixing forward/rescale channels with keyby or other shuffle operations
- Unions
- 2 or n-ary operators
- Associated state ((keyed) process function)
- Correctness verifications

The sinks should not be mocked but rather should be able to induce a fair 
amount of backpressure into the system. Then, after induced failure, the user 
needs to restart from a retained checkpoint with
- lower
- same
- higher degree of parallelism.

To enable unaligned checkpoints, set 
- execution.checkpointing.unaligned: true
- execution.checkpointing.alignment-timeout to 0s, 10s, 1min (for high 
backpressure)

The primary objective is to check if all data is recovered properly and if the 
semantics is correct (does state match input?). 

The secondary objective is to check if Flink UI shows the information correctly:
- unaligned checkpoint enabled on job level
- timeout on job level
- for each checkpoint, if it's unaligned or not; how much data was written

> Test unaligned checkpoints rescaling manually on a real cluster
> ---------------------------------------------------------------
>
>                 Key: FLINK-22132
>                 URL: https://issues.apache.org/jira/browse/FLINK-22132
>             Project: Flink
>          Issue Type: Test
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.13.0
>
>
> To test unaligned checkpoints, we should use a few different applications 
> that use different features:
> - Mixing forward/rescale channels with keyby or other shuffle operations
> - Unions
> - 2 or n-ary operators
> - Associated state ((keyed) process function)
> - Correctness verifications
> The sinks should not be mocked but rather should be able to induce a fair 
> amount of backpressure into the system. Then, after induced failure, the user 
> needs to restart from a retained checkpoint with
> - lower
> - same
> - higher degree of parallelism.
> To enable unaligned checkpoints, set 
> - execution.checkpointing.unaligned: true
> - execution.checkpointing.alignment-timeout to 0s, 10s, 1min (for high 
> backpressure)
> The primary objective is to check if all data is recovered properly and if 
> the semantics is correct (does state match input?). 
> The secondary objective is to check if Flink UI shows the information 
> correctly:
> - unaligned checkpoint enabled on job level
> - timeout on job level
> - for each checkpoint, if it's unaligned or not; how much data was written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to