[ 
https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056332#comment-16056332
 ] 

Cliff Resnick commented on FLINK-6633:
--------------------------------------

The issue that [~gyfora] mentioned still exists in current 1.4-SNAPSHOT, at 
least when using externalized checkpoints. It does not necessarily happen on 
first checkpoint after restore, but it does seem to stem from a job restart 
from externalized checkpoint.  To help identify the cause I added a bit of 
logging to both RocksDBKeyedStateBackend and SavepointV2Serializer, the results 
of which I'm attaching to the issue. The log spans several checkpoints. You can 
see where sst files are mapped, then serialized. The last checkpoint (7) fails 
when it seems to try to serialize a Placeholder instead of 000027.sst. 

I hope this helps. If I can add logging to capture more relevant state please 
let me know (the test is reproducible). 

By the way, I also noticed that some sst files are re-serialized in subsequent 
checkpoints though their byte size does not change. Is that because they are 
still "hot" in RocksDB? I'm a bit sketchy on the concept so please forgive me! 

> Register with shared state registry before adding to CompletedCheckpointStore
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-6633
>                 URL: https://issues.apache.org/jira/browse/FLINK-6633
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> Introducing placeholders for previously existing shared state requires a 
> change that shared state is first registering with {{SharedStateregistry}} 
> (thereby being consolidated) and only after that added to a 
> {{CompletedCheckpointStore}}, so that the consolidated checkpoint is written 
> to stable storage. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to