[ 
https://issues.apache.org/jira/browse/HDDS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699427#comment-17699427
 ] 

Xu Shao Hong commented on HDDS-8131:
------------------------------------

 
{code:java}
if (purgeUptoSnapshotIndex) {
  // We can purge up to snapshot index even if all the peers do not have
  // commitIndex up to this snapshot index.
  purgeIndex = i;
} else {
  final LongStream commitIndexStream = 
server.getCommitInfos().stream().mapToLong(
      CommitInfoProto::getCommitIndex);
  purgeIndex = LongStream.concat(LongStream.of(i), 
commitIndexStream).min().orElse(i);
} {code}
if purgeUptoSnapshotIndex is false, the purgeIndex will be of the minimum of 
(Ratis' snapshotIndex,  commitIndex).

 

Most likely, the commitIndex shall be larger than Ratis' snapshotIndex. So 
turning on this config shall have a smaller suggestIndex and delay a bit purge.

-----------------

Currently, there is no measure to avoid the purge when your cluster is 
receiving more requests constantly. These options are only for alleviation.

> Add Configuration for OM Ratis Log Purge Tuning Parameters
> ----------------------------------------------------------
>
>                 Key: HDDS-8131
>                 URL: https://issues.apache.org/jira/browse/HDDS-8131
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Manager
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.3.0
>
>
> Currently Ozone Manager enables {{raft.server.log.purge.upto.snapshot.index}} 
> by default.
> However, for OM cluster with large metadata store, there might be a case 
> where OM leader purge its Ratis logs before a slow follower replicated it to 
> its log. This means that the follower needs to download the whole metadata 
> store from the OM leader. This can be problematic if the metadata store in 
> leader is too large.
> We should add two configurations in OM to enable/disable Ratis purge 
> parameters:
>  * {{raft.server.log.purge.upto.snapshot.index}}
>  ** Disabling this would guarantee that the OM leader will not purge its 
> Ratis log unless all the logs have been replicated to all the followers 
> (through {{{}commitIndex{}}}).
>  ** This would effectively means that there shouldn't be a case where the 
> slow follower needs to download the full metadata from the leader. So no 
> snapshot download from follower. For small OM metadata, it can be faster for 
> follower to download the leader's metadata snapshot than normally replicating 
> and applying the outstanding logs.
>  ** For a very slow follower / downed follower, the OM leader cannot purge 
> the log until the follower catch up to it. This might increase the disk space 
> usage for OM leader.
>  ** Default would be {{true}} to preserve the current OM snapshot behavior
>  * {{raft.server.log.purge.preservation.log.num}}
>  ** RATIS-1626 introduces logic to preserve the latest n won't-be-purged logs
>  ** Setting n > 0 while still enabling 
> {{raft.server.log.purge.upto.snapshot.index}} should balance a between the 
> cost of preserving & transferring logs and the cost of transferring snapshot.
>  ** Default would be 0 to preserve the current OM snapshot behavior



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to