showuon commented on PR #14428:
URL: https://github.com/apache/kafka/pull/14428#issuecomment-1759306603

   > I'm wondering if we may start causing leaders to resign when followers are 
slow/backlogged and make the situation worse? E.g. if we have multiple 
followers that need to catch up via a large fetch snapshot, they are unable to 
fetch again prior to the timeout expiring, and cause the current leader to 
resign. I don't believe this would be very disruptive but wanted to check folks 
had considered this/similar situation.
   
   Yes, with 1.5x of timeout, this issue should be resolved. Also, if one 
follower is slow due to whatever reason, and doesn't fetch again within fetch 
timeout, it'll also start a new election. That's already the current 
implementation.
   
   > I think we can also modify QUORUM_FETCH_TIMEOUT_MS_DOC to be slightly more 
explicit too (i.e. Maximum time a leader can go without receiving valid fetch 
or fetchsnapshot request from a majority of the quorum before resigning or 
something slightly different if we choose to use 1.5x)
   
   Doc updated. I don't think we need to mention anything about 1.5x because 
that's the implementation detail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to