[
https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056332#comment-16056332
]
Cliff Resnick commented on FLINK-6633:
--------------------------------------
The issue that [~gyfora] mentioned still exists in current 1.4-SNAPSHOT, at
least when using externalized checkpoints. It does not necessarily happen on
first checkpoint after restore, but it does seem to stem from a job restart
from externalized checkpoint. To help identify the cause I added a bit of
logging to both RocksDBKeyedStateBackend and SavepointV2Serializer, the results
of which I'm attaching to the issue. The log spans several checkpoints. You can
see where sst files are mapped, then serialized. The last checkpoint (7) fails
when it seems to try to serialize a Placeholder instead of 000027.sst.
I hope this helps. If I can add logging to capture more relevant state please
let me know (the test is reproducible).
By the way, I also noticed that some sst files are re-serialized in subsequent
checkpoints though their byte size does not change. Is that because they are
still "hot" in RocksDB? I'm a bit sketchy on the concept so please forgive me!
> Register with shared state registry before adding to CompletedCheckpointStore
> -----------------------------------------------------------------------------
>
> Key: FLINK-6633
> URL: https://issues.apache.org/jira/browse/FLINK-6633
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Affects Versions: 1.3.0
> Reporter: Stefan Richter
> Assignee: Stefan Richter
> Priority: Blocker
> Fix For: 1.3.0
>
>
> Introducing placeholders for previously existing shared state requires a
> change that shared state is first registering with {{SharedStateregistry}}
> (thereby being consolidated) and only after that added to a
> {{CompletedCheckpointStore}}, so that the consolidated checkpoint is written
> to stable storage.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)