[ 
https://issues.apache.org/jira/browse/RATIS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978545#comment-16978545
 ] 

Tsz-wo Sze commented on RATIS-458:
----------------------------------

Let's step back a little bit to see what the problem we are trying to solve 
here.

bq. In GrpcLogAppender when an append entry times out we remove the entry from 
the pendingRequests. This decreases the size of pendingRequests which affects 
the logic in GrpcLogAppender#shouldWait. ...

When a request timeout, the leader should try to re-send it.  This seems 
correct.  If we use (nextIndex - matchIndex), do we expect the leader won't 
resend the timeout requests?  

If for some reason the matchIndex is not updated, shouldWait() may return true 
forever.  It may happen when the requests get lost.  Do we assume that the 
requests are never lost?

BTW, what is problem we are observing in the current approach?  Followers are 
getting too many requests?

> GrpcLogAppender#shouldWait should wait on pending log entries to follower
> -------------------------------------------------------------------------
>
>                 Key: RATIS-458
>                 URL: https://issues.apache.org/jira/browse/RATIS-458
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Blocker
>              Labels: ozone
>         Attachments: RATIS-458.001.patch, RATIS-458.002.patch, 
> RATIS-458.003.patch, RATIS-458.004.patch, RATIS-458.005.patch
>
>
> In GrpcLogAppender when an append entry times out we remove the entry from 
> the pendingRequests. This decreases the size of pendingRequests which affects 
> the logic in GrpcLogAppender#shouldWait. Further we also consider heartbeats 
> in shouldWait because heartbeats are tracked in pendingRequests. It should 
> actually wait on the number of log entries which are appended to follower but 
> have not yet been processed by it.
> GrpcConfigKeys.Server.leaderOutstandingAppendsMax should also be a fraction 
> of RaftServerConfigKeys.Log.queueSize. This brings flow control for leader's 
> append entries to follower because then number of outstanding append entries 
> in leader would be limited by maximum number of operations in raft log worker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to