Yun Tang created FLINK-25478:
--------------------------------
Summary: Changelog materialization with incremental checkpoint
could cause checkpointed data lost
Key: FLINK-25478
URL: https://issues.apache.org/jira/browse/FLINK-25478
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing, Runtime / State Backends
Reporter: Yun Tang
Fix For: 1.15.0
Currently, changelog materialization would call RocksDB state backend's
snapshot method to generate {{IncrementalRemoteKeyedStateHandle}} as
ChangelogStateBackendHandleImpl's materialized artifacts. And before next
materialization, it will always report the same
{{IncrementalRemoteKeyedStateHandle}} as before.
It's fine to register this for the 1st time. However, for the 2nd time to
register {{IncrementalRemoteKeyedStateHandle}} (via
{{ChangelogStateBackendHandleImpl#registerSharedStates}}), it will discard the
private state artifacts without check the register reference:
IncrementalRemoteKeyedStateHandle:
{code:java}
public void discardState() throws Exception {
try {
StateUtil.bestEffortDiscardAllStateObjects(privateState.values());
} catch (Exception e) {
LOG.warn("Could not properly discard misc file states.", e);
}
}
{code}
Thus, this would delete the private state (such as RocksDB's MAINFEST), and
once restore, job would not report FileNotFoundException.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)