[
https://issues.apache.org/jira/browse/KUDU-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386916#comment-15386916
]
Todd Lipcon commented on KUDU-1407:
-----------------------------------
Actually it seems like this isn't an issue, but a sort of "inverse" is an
issue: if a tablet _fails_ to start up, it will be _stuck_ in
TABLET_NOT_RUNNING state (because the state is FAILED). In that case, we
_should_ evict the old replica in order to repair it, most likely.
> Leader should not evict a follower when the follower is in the process of
> starting up a tablet
> ----------------------------------------------------------------------------------------------
>
> Key: KUDU-1407
> URL: https://issues.apache.org/jira/browse/KUDU-1407
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Priority: Critical
>
> It seems like, if the leader gets an error from one of its followers because
> the tablet is not running, it considers this replica to be 'unresponsive'. If
> this happens for 5 minutes, it will evict that follower to try to create a
> new replica.
> This can cause problems at cluster startup time when there is a lot of data
> and a cold disk cache - the startup bootstrap process might be more than five
> minutes and leaders might end up evicting followers that are perfectly
> healthy (just in the process of coming up).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)