[jira] [Resolved] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream
[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvid Heise resolved FLINK-20654. - Resolution: Fixed > Unaligned checkpoint recovery may lead to corrupted data stream > --- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.12.0, 1.12.1 >Reporter: Arvid Heise >Assignee: Piotr Nowojski >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.13.0, 1.12.2 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;10] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream
[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arvid Heise resolved FLINK-20654. - Resolution: Fixed > Unaligned checkpoint recovery may lead to corrupted data stream > --- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.12.0, 1.12.1 >Reporter: Arvid Heise >Assignee: Roman Khachatryan >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.13.0, 1.12.2 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;10] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)