[ https://issues.apache.org/jira/browse/FLINK-31963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721788#comment-17721788 ]
Stefan Richter commented on FLINK-31963: ---------------------------------------- I have a local reproducer as well as a fix, will open a PR once I have written the tests. > java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned > checkpoints > ------------------------------------------------------------------------------------- > > Key: FLINK-31963 > URL: https://issues.apache.org/jira/browse/FLINK-31963 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.17.0 > Environment: Flink: 1.17.0 > FKO: 1.4.0 > StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint > enabled) > Reporter: Tan Kim > Assignee: Stefan Richter > Priority: Critical > Labels: stability > Attachments: image-2023-04-29-02-49-05-607.png, jobmanager_error.txt, > taskmanager_error.txt > > > I'm testing Autoscaler through Kubernetes Operator and I'm facing the > following issue. > As you know, when a job is scaled down through the autoscaler, the job > manager and task manager go down and then back up again. > When this happens, an index out of bounds exception is thrown and the state > is not restored from a checkpoint. > [~gyfora] told me via the Flink Slack troubleshooting channel that this is > likely an issue with Unaligned Checkpoint and not an issue with the > autoscaler, but I'm opening a ticket with Gyula for more clarification. > Please see the attached JM and TM error logs. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)