Xushaohong opened a new pull request, #647:
URL: https://github.com/apache/ratis/pull/647

   ## What changes were proposed in this pull request?
   
   Currently, the notify install snapshot would not inform when the whole 
progress is done
   
   From the Ozone side, the statemachine's notifyInstallSnapshotFromLeader is a 
single request and process. It is fine before we find out that the installation 
of the snapshot could get stuck due to the whole RocksDB replacement each time 
(the leader could have purged the raft log during transferring the snapshot and 
thus triggers another snapshot installation when the previous install request 
is done). To solve this, we come up with the incremental snapshot idea, which 
could transfer the incremental part of RocksDB in the next install request, and 
needs to preserve the checkpoints.  The incremental snapshot needs to compare 
the checkpoints and hence the checkpoints cannot be deleted after the first 
request to install a snapshot.
   
   The cleanup time of these checkpoints is hard to determine. It is difficult 
for the follower to tell whether the latest installed snapshot is the last one 
and apply the logs immediately. The cleanup time depends on the leader's state, 
and only the leader knows if it is the time to notify the snapshot again or 
just send append entries. Only when the leader thinks that the follower has 
already caught up could trigger the cleanup( error case is not included here).
   
   Thus, we shall have an event to help trigger the cleanup of the checkpoints 
for the Ozone or generally inform the completeness of the install snapshot, 
which means no more install snapshot requests will be sent and the follower has 
caught up.
   
   We trigger this event for both the leader and the follower.
   As for the leader
   1. when the leader receives the snapshot result `SNAPSHOT_INSTALLED`
   2. when the leader receives the snapshot result  `SNAPSHOT_UNAVAILABLE`
   
   As for the follower
   1. when the follower tries appending new entries after successfully 
installed one snapshot for the first time
   2. when the follower knows the statemachine's snapshot is unavailable
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/RATIS-1582
   
   ## How was this patch tested?
   Manual test for ozone.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to