[ 
https://issues.apache.org/jira/browse/FLINK-25872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524796#comment-17524796
 ] 

Dawid Wysakowicz commented on FLINK-25872:
------------------------------------------

Sorry, for joining so late.

I'll try to rephrase the issue to make sure I understand it. First of all, the 
problem appears when changing the state backend, from a non-changelog state 
backend to changelog state backend. This does not work, because changelog state 
backend uses different state handles than other state backends (obviously). 
First question, for me is that is something that should be supported, from the 
guarantees we're giving not necessarily. Changing state backend is supported 
only via a savepoint. Having said that I understand the answer is that you do 
want to support that nevertheless.

Question: Is the problem related to state handles registered in 
{{SharedStateRegistry}}? Or does it affect non-shared, private parts of the 
initial checkpoint? If I understand correctly, it does affect also originally 
private parts of the initial checkpoint, right?

There are two proposed solutions to the problem. On a high level:
# (Roman's) Treat all handles of the initial checkpoint as shared ones, 
irrespective if the changelog state backend is used or not.
# (Yanfei's) Add a logic that converts the non-changelog checkpoint to the 
changelog checkpoint when restoring. This enforces making JobManager aware of 
Changelog state backend.

I feel both solutions add additional logic and complexity to the JM. Sorry if I 
am boring/annoying, but are we sure it is the right decision to support this 
kind of state backend switching? BTW, does the switching in the other direction 
work as well? What happens if we want to disable previously enabled changelog 
state backend? Is this supported?

> Restoring from non-changelog checkpoint with changelog state-backend enabled 
> in CLAIM mode discards state in use
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25872
>                 URL: https://issues.apache.org/jira/browse/FLINK-25872
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / State Backends
>            Reporter: Yun Tang
>            Assignee: Yanfei Lei
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.0
>
>
> If we restore from checkpoint with changelog state-backend enabled in 
> snapshot CLAIM mode, the restored checkpoint would be discarded on subsume. 
> This invalidates newer/active checkpoints because their materialized part is 
> discarded (for incremental wrapped checkpoints, their private state is 
> discarded). This bug is like FLINK-25478.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to