devabhishekpal opened a new pull request, #1370:
URL: https://github.com/apache/ratis/pull/1370

   ## What changes were proposed in this pull request?
   
   ### Problem
   In the current implementation, the leader provides the snapshot. However 
this causes tasks to get paused until the snapshot installation is completed 
and also puts unnecessary pressure on the leader.
   
   ### Goal
   
   Allow a lagging follower to install a snapshot from another follower without 
making that source follower act as the leader. This will let the lagging 
follower stay either in sync or "catch-up" to the point where it can append new 
entries without a complete snapshot.
   
   ### High Level Flow Diagram
   <img width="369" height="438" alt="image" 
src="https://github.com/user-attachments/assets/59583533-edb1-4ba9-bbc2-6b9f8bae277c";
 />
   
   ### How do we select the follower source?
   For selecting the follower source we consider the following metrics / 
conditions.
   
   #### Inputs
   - Target follower (`T`) which needs snapshot
   - Leader log state - specifically `lastEntry`, `logStartIndex` and 
`firstAvailableTermIndex`
   - Current follower progresses - specifically `matchIndex`, `commitIndex`, 
`snapshotIndex`, `lastRespondedAppendEntriesSendTime` and `lastRpcResponseTime`.
   
   #### Eligibility
   A follower (`F`) is considered a source only if:
   - `F` is recently responsive on the append path, for this we can use 
`lastRespondedAppendEntriesSendTime` as a check and fallback to 
`lastRpcResponseTime`.
   - `F.matchIndex >= requiredSnapshotIndex` where `requiredSnapshotIndex = 
firstAvailableTermIndex.index - 1`
   
   This is because:
   - `requiredSnapshotIndex` is the minimum snapshot index that still lets the 
target resume normal AppendEntries from the leader after install.
   - If `F.matchIndex < requiredSnapshotIndex`, that follower is too far behind 
to bridge the leader's log gap for this target, so the leader should not choose 
it.
   
   #### Ranking
   Rank eligible followers by this lexicographic order:
   - Exact sync with leader i.e. it is fully caught up to the leader
   - Highest `matchIndex`
   - Highest `commitIndex` in case match index is tied
   - Freshest `lastRespondedAppendEntriesSendTime`
   
   ##### IMPORTANT: If no follower satisfies `matchIndex >= 
requiredSnapshotIndex`, do not attempt follower-sourced install. Fall back 
immediately to the existing leader path because otherwise the target follower 
will need to perform another snapshot install to catchup anyway.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/RATIS-2428
   
   ## How was this patch tested?
   Patch was tested using the unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to