[ 
https://issues.apache.org/jira/browse/FLINK-22132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-22132:
--------------------------------
    Description: 
To test unaligned checkpoints, we should use a few different applications that 
use different features.

The sinks should not be mocked but rather should be able to induce a fair 
amount of backpressure into the system. Quite possibly, it would be a good idea 
to have a way to add more backpressure to the sink by running the respective 
system on the cluster and be able to add/remove parallel instances.

The primary objective is to check if all data is recovered properly and if the 
semantics is correct (does state match input?). 

The secondary objective is to check if Flink UI shows the information correctly.

More details in the subtasks.

  was:
To test unaligned checkpoints, we should use a few different applications that 
use different features:
- Mixing forward/rescale channels with keyby or other shuffle operations
- Unions
- 2 or n-ary operators
- Associated state ((keyed) process function)
- Correctness verifications

The sinks should not be mocked but rather should be able to induce a fair 
amount of backpressure into the system. Then, after induced failure, the user 
needs to restart from a retained checkpoint with
- lower
- same
- higher degree of parallelism.

To enable unaligned checkpoints, set 
- execution.checkpointing.unaligned: true
- execution.checkpointing.alignment-timeout to 0s, 10s, 1min (for high 
backpressure)

The primary objective is to check if all data is recovered properly and if the 
semantics is correct (does state match input?). 

The secondary objective is to check if Flink UI shows the information correctly:
- unaligned checkpoint enabled on job level
- timeout on job level
- for each checkpoint, if it's unaligned or not; how much data was written


> Test unaligned checkpoints rescaling manually on a real cluster
> ---------------------------------------------------------------
>
>                 Key: FLINK-22132
>                 URL: https://issues.apache.org/jira/browse/FLINK-22132
>             Project: Flink
>          Issue Type: Test
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.13.0
>
>
> To test unaligned checkpoints, we should use a few different applications 
> that use different features.
> The sinks should not be mocked but rather should be able to induce a fair 
> amount of backpressure into the system. Quite possibly, it would be a good 
> idea to have a way to add more backpressure to the sink by running the 
> respective system on the cluster and be able to add/remove parallel instances.
> The primary objective is to check if all data is recovered properly and if 
> the semantics is correct (does state match input?). 
> The secondary objective is to check if Flink UI shows the information 
> correctly.
> More details in the subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to