[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056332#comment-16056332 ]
Cliff Resnick commented on FLINK-6633: -------------------------------------- The issue that [~gyfora] mentioned still exists in current 1.4-SNAPSHOT, at least when using externalized checkpoints. It does not necessarily happen on first checkpoint after restore, but it does seem to stem from a job restart from externalized checkpoint. To help identify the cause I added a bit of logging to both RocksDBKeyedStateBackend and SavepointV2Serializer, the results of which I'm attaching to the issue. The log spans several checkpoints. You can see where sst files are mapped, then serialized. The last checkpoint (7) fails when it seems to try to serialize a Placeholder instead of 000027.sst. I hope this helps. If I can add logging to capture more relevant state please let me know (the test is reproducible). By the way, I also noticed that some sst files are re-serialized in subsequent checkpoints though their byte size does not change. Is that because they are still "hot" in RocksDB? I'm a bit sketchy on the concept so please forgive me! > Register with shared state registry before adding to CompletedCheckpointStore > ----------------------------------------------------------------------------- > > Key: FLINK-6633 > URL: https://issues.apache.org/jira/browse/FLINK-6633 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing > Affects Versions: 1.3.0 > Reporter: Stefan Richter > Assignee: Stefan Richter > Priority: Blocker > Fix For: 1.3.0 > > > Introducing placeholders for previously existing shared state requires a > change that shared state is first registering with {{SharedStateregistry}} > (thereby being consolidated) and only after that added to a > {{CompletedCheckpointStore}}, so that the consolidated checkpoint is written > to stable storage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)