[ https://issues.apache.org/jira/browse/KUDU-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408841#comment-16408841 ]
Todd Lipcon commented on KUDU-1097: ----------------------------------- [~mpercy] [~aserbin] it seems we should probably mark this as resolved in 1.7? > Higher availability re-replication support > ------------------------------------------ > > Key: KUDU-1097 > URL: https://issues.apache.org/jira/browse/KUDU-1097 > Project: Kudu > Issue Type: Sub-task > Components: consensus > Affects Versions: Public beta > Reporter: Mike Percy > Assignee: Mike Percy > Priority: Critical > > Relative to the re-replication support outlined in KUDU-1096, we can do > better in terms of availability properties. Here is a rough outline of such a > design. > Design: > # When a voter falls behind the leader's log GC threshold, the leader > notifies the Master that the voter is no longer up to date. > # The Master selects a node to act as a replacement. It adds that node as a > PRE_VOTER to the config (see KUDU-869) and when that node is caught up, it is > automatically promoted to a VOTER. > # When the Master detects that the node has been promoted, it removes the bad > node from the config. > Additional cases to detect and handle: > * If the config is in such a state that it would be impossible to add a node, > due to a voter that has fallen behind the log GC threshold being in the > required majority, then remotely bootstrap that voter without changing the > config. The tablet will continue to be unable to serve writes during this > time, but will self-heal without administrator intervention. > This can be further improved by adding support for aborting a config-change > operation that cannot commit. > This requires some additional plumbing from the leader to the Master to > notify it of slow followers. > Pros: > * Closer to optimal fault-tolerance properties; "majority lost" less likely > to occur so administrator intervention less likely > Cons: > * Requires support for pre-voter and a smarter master. -- This message was sent by Atlassian JIRA (v7.6.3#76005)