[ https://issues.apache.org/jira/browse/FLINK-25872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524796#comment-17524796 ]
Dawid Wysakowicz commented on FLINK-25872: ------------------------------------------ Sorry, for joining so late. I'll try to rephrase the issue to make sure I understand it. First of all, the problem appears when changing the state backend, from a non-changelog state backend to changelog state backend. This does not work, because changelog state backend uses different state handles than other state backends (obviously). First question, for me is that is something that should be supported, from the guarantees we're giving not necessarily. Changing state backend is supported only via a savepoint. Having said that I understand the answer is that you do want to support that nevertheless. Question: Is the problem related to state handles registered in {{SharedStateRegistry}}? Or does it affect non-shared, private parts of the initial checkpoint? If I understand correctly, it does affect also originally private parts of the initial checkpoint, right? There are two proposed solutions to the problem. On a high level: # (Roman's) Treat all handles of the initial checkpoint as shared ones, irrespective if the changelog state backend is used or not. # (Yanfei's) Add a logic that converts the non-changelog checkpoint to the changelog checkpoint when restoring. This enforces making JobManager aware of Changelog state backend. I feel both solutions add additional logic and complexity to the JM. Sorry if I am boring/annoying, but are we sure it is the right decision to support this kind of state backend switching? BTW, does the switching in the other direction work as well? What happens if we want to disable previously enabled changelog state backend? Is this supported? > Restoring from non-changelog checkpoint with changelog state-backend enabled > in CLAIM mode discards state in use > ---------------------------------------------------------------------------------------------------------------- > > Key: FLINK-25872 > URL: https://issues.apache.org/jira/browse/FLINK-25872 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends > Reporter: Yun Tang > Assignee: Yanfei Lei > Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > If we restore from checkpoint with changelog state-backend enabled in > snapshot CLAIM mode, the restored checkpoint would be discarded on subsume. > This invalidates newer/active checkpoints because their materialized part is > discarded (for incremental wrapped checkpoints, their private state is > discarded). This bug is like FLINK-25478. -- This message was sent by Atlassian Jira (v8.20.7#820007)