[jira] [Resolved] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2021-03-26 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise resolved FLINK-20654.
-
Resolution: Fixed

> Unaligned checkpoint recovery may lead to corrupted data stream
> ---
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.12.1
>Reporter: Arvid Heise
>Assignee: Piotr Nowojski
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.13.0, 1.12.2
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all 
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me 
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be 
> [0;10] has been mis-deserialized)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2021-01-22 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise resolved FLINK-20654.
-
Resolution: Fixed

> Unaligned checkpoint recovery may lead to corrupted data stream
> ---
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.12.1
>Reporter: Arvid Heise
>Assignee: Roman Khachatryan
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.13.0, 1.12.2
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all 
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me 
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be 
> [0;10] has been mis-deserialized)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)