ChenSammi commented on pull request #508:
URL: https://github.com/apache/ratis/pull/508#issuecomment-929960558


   > Thanks @ChenSammi for the patch. This should be disabled by default in 
ratis as @szetszwo suggested,
   > 
   > The only problem i think what we will run into eventually will be :
   > 
   > If a slow follower is just not able to catch up with the leader and 
exceeds the threshold, the leader itself will not be able to apply the 
transactions whatever have already been appended to its own log . In such 
cases, probably, it should be able to remove the node and apply the pending 
transactions which have been appended on the raft log,.
   > 
   
   If the slow follower is alive and function well, it will eventually catch up 
when the leader and another follower are waiting for it. If the slow follower 
doesn't response, it will eventually trigger the pipeline close after 
"raft.server.rpc.slowness.timeout" timeout(300s in Ozone).  
   Removing a slow follwer node, leave the two members keep going on, or shall 
we add another new member to the raft group? 
   
   > Also, a degenerate case will be where we are not updating the majority 
index to the max but the min of the nodes, but, the leader itself will be 
accepting accepting transactions till the pending request limit is reached.
   > 
   > The other approach can be to not remove the entry from the cache 
aggressively once applyTransaction is called in Ozone. it should be available 
on the data cache as long as it is within the threshold of difference between 
majority and min index The Once it exceeds the threshold , the pipeline can be 
closed. The data in the stateMachine case will be there in the cache, until all 
the followers get the data(majority index == min index) . Once all the nodes 
have the data, or the gap between majority and min index is within the 
threshold , the entry will always be in the cache. Otherwise, remove the entry 
from cache and mark the node as slow follower and handle the slow follower case 
in ozone by closing down the pipeline.
   
   I opened a Ozone ticket for the cache improvement HDDS-5791.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to