[
https://issues.apache.org/jira/browse/RATIS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xu Shao Hong updated RATIS-1582:
--------------------------------
Summary: Add notify install snapshot finished event to inform the finish
stage (was: Add notify install snapshot finished event to trigger cleanup of
snapshot)
> Add notify install snapshot finished event to inform the finish stage
> ---------------------------------------------------------------------
>
> Key: RATIS-1582
> URL: https://issues.apache.org/jira/browse/RATIS-1582
> Project: Ratis
> Issue Type: Improvement
> Components: snapshot
> Reporter: Xu Shao Hong
> Assignee: Xu Shao Hong
> Priority: Major
>
> Currently, the notify install snapshot would not inform when the whole
> progress is done
> From the Ozone side, the statemachine's notifyInstallSnapshotFromLeader is a
> single request and process. It is fine before we find out that the
> installation of the snapshot could get stuck due to the whole RocksDB
> replacement each time (the leader could have purged the raft log during
> transferring the snapshot and thus triggers another snapshot installation
> when the previous install request is done). To solve this, we come up with
> the incremental snapshot idea, which could transfer the incremental part of
> RocksDB in the next install request, and needs to preserve the checkpoints.
> The incremental snapshot needs to compare the checkpoints and hence the
> checkpoints cannot be deleted after the first request to install a snapshot.
> The cleanup time of these checkpoints is hard to determine. It is difficult
> for the follower to tell whether the latest installed snapshot is the last
> one and apply the logs immediately. The cleanup time depends on the leader's
> state, and only the leader knows if it is the time to notify the snapshot
> again or just send append entries. Only when the leader thinks that the
> follower has already caught up could trigger the cleanup( error case is not
> included here).
> Thus, we shall have an event to help trigger the cleanup the checkpoints for
> the Ozone or generally inform the completeness of the install snapshot, which
> means no more install snapshot requests will be sent and the follower has
> caught up.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)