Xushaohong commented on PR #647: URL: https://github.com/apache/ratis/pull/647#issuecomment-1138117795
> Hi @Xushaohong, thanks for the proposal. I have a question about it. > > From ozone's view, if some file in the old checkpoint needs to be purged, doesn't it mean that file is useless? > > For example, suppose the old checkpoint is `[1, 2, 3]`, and the new checkpoint is `[1, 3, 4]`. Even if the lagging DN successfully pulls file `2` from master and finished the `NotifyInstallSnapshot`. Next iteration of incremental snapshot will pull `4` and delete `2`. So why do we need to keep `2` on master until `NotifyInstallSnapshot` is done? > > Please correct me if I'm wrong. The Master should keep the checkpoint until the follower truly catches up. Each checkpoint is a complete snapshot of RDB. We don't care about the stale SST files as you mentioned the file ```2``. We keep the checkpoint as a unit to compare, which simplifies the logic. Master will compare the latest checkpoint and newly-created checkpoint, and send the incremental part. For the stale SSTs, the follower RDB has a mechanism to deal with them, that is using MANIFEST to detect the stale SSTs and GC them. Basically, the master only needs to have two checkpoints each time for the notify request, we could choose to delete old checkpoints or not, but only deletes all checkpoints after the ratis server tells us the snapshot is finished. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
