[
https://issues.apache.org/jira/browse/HDDS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699579#comment-17699579
]
Xu Shao Hong commented on HDDS-8131:
------------------------------------
The influential purge is the leader's purge, and so is the leader's commitIndex.
The commitIndex is an index committed by the majority. One slow follower will
not affect the leader's commitIndex.
> there is a chance where the leader's snapshotIndex is larger than the
> commitIndex and nextIndex of the late follower
I did not get what this means or implies. From my understanding, turning off
the purgeUptoSnapshotIndex seems to trigger the purge more easily.
> Add Configuration for OM Ratis Log Purge Tuning Parameters
> ----------------------------------------------------------
>
> Key: HDDS-8131
> URL: https://issues.apache.org/jira/browse/HDDS-8131
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Manager
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.4.0
>
>
> Currently Ozone Manager enables {{raft.server.log.purge.upto.snapshot.index}}
> by default.
> However, for OM cluster with large metadata store, there might be a case
> where OM leader purge its Ratis logs before a slow follower replicated it to
> its log. This means that the follower needs to download the whole metadata
> store from the OM leader. This can be problematic if the metadata store in
> leader is too large.
> We should add two configurations in OM to enable/disable Ratis purge
> parameters:
> * {{raft.server.log.purge.upto.snapshot.index}}
> ** Disabling this would guarantee that the OM leader will not purge its
> Ratis log unless all the logs have been replicated to all the followers
> (through {{{}commitIndex{}}}).
> ** This would effectively means that there shouldn't be a case where the
> slow follower needs to download the full metadata from the leader. So no
> snapshot download from follower. For small OM metadata, it can be faster for
> follower to download the leader's metadata snapshot than normally replicating
> and applying the outstanding logs.
> ** For a very slow follower / downed follower, the OM leader cannot purge
> the log until the follower catch up to it. This might increase the disk space
> usage for OM leader.
> ** Default would be {{true}} to preserve the current OM snapshot behavior
> * {{raft.server.log.purge.preservation.log.num}}
> ** RATIS-1626 introduces logic to preserve the latest n won't-be-purged logs
> ** Setting n > 0 while still enabling
> {{raft.server.log.purge.upto.snapshot.index}} should balance a between the
> cost of preserving & transferring logs and the cost of transferring snapshot.
> ** Default would be 0 to preserve the current OM snapshot behavior
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]