[ https://issues.apache.org/jira/browse/KUDU-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484712#comment-16484712 ]
Will Berkeley commented on KUDU-2452: ------------------------------------- > I think we already do stop the failure detector during UpdateReplica, at > least while we're waiting on the log, don't we? You're right. We wait on the log but periodically wake up to snooze the failure detector. > Prevent follower from causing pre-elections when UpdateConsensus is slow > ------------------------------------------------------------------------ > > Key: KUDU-2452 > URL: https://issues.apache.org/jira/browse/KUDU-2452 > Project: Kudu > Issue Type: Improvement > Affects Versions: 1.7.0 > Reporter: Will Berkeley > Priority: Major > > Thanks to pre-elections (KUDU-1365), slow UpdateConsensus calls on a single > follower don't disturb the whole tablet by calling elections. However, > sometimes I see situations where one or more followers are constantly calling > pre-elections, and only rarely, if ever, overflowing their service queues. > Occasionally, in 3x replicated tablets, the followers will get "lucky" and > detect a leader failure at around the same time, and an election will happen. > This background instability has caused bugs like KUDU-2343 that should be > rare to occur pretty frequently, plus the extra RequestConsensusVote RPCs add > a little more stress on the consensus service and on replicas' consensus > locks. It also spams the logs, since there's no generally no exponential > backoff for these pre-elections because there's a successful heartbeat in > between them. > It seems like we can get into the situation where the average number of > in-flight consensus requests is constant over time, so on average we are > processing each heartbeat in less than the heartbeat interval, however some > heartbeats take longer. Since UpdateConsensus calls to a replica are > serialized, a few of these in a row trigger the failure detector, despite the > follower receiving every heartbeat in a timely manner and responding > successfully eventually (and on average in a timely manner). > It'd be nice to prevent these worthless pre-elections. A couple of ideas: > 1. Separately calculate a backoff for failed pre-elections, and reset it when > a pre-election succeeds or more generally when there's an election. > 2. Don't count the time the follower is executing UpdateConsensus against the > failure detector. [~mpercy] suggested stopping the failure detector during > UpdateReplica() and resuming it when the function returns. > 3. Move leader failure detection out-of-band of UpdateConsensus entirely. -- This message was sent by Atlassian JIRA (v7.6.3#76005)