[ https://issues.apache.org/jira/browse/RATIS-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155319#comment-17155319 ]
runzhiwang edited comment on RATIS-800 at 7/10/20, 10:27 AM: ------------------------------------------------------------- [~ljain] Thanks for review. bq. Balancing the leader in an active ratis ring might be difficult to achieve. For a candidate to be elected as leader its term and index should be >= follower's term index. Even if we trigger an election it is not guaranteed that the datanode will become leader. We can first focus on balance leader. This has been explained in raft paper as following. If leadership transfer does not complete after about an election timeout, the prior leader aborts the transfer and still act as the leader, and resumes accepting client requests. !image-2020-07-10-18-27-01-890.png! was (Author: yjxxtd): [~ljain] Thanks for review. bq. Balancing the leader in an active ratis ring might be difficult to achieve. For a candidate to be elected as leader its term and index should be >= follower's term index. Even if we trigger an election it is not guaranteed that the datanode will become leader. We can first focus on balance leader. This has been explained in raft paper as following. If leadership transfer does not complete after about an election timeout, the prior leader aborts the transfer and still act as the leader, and resumes accepting client requests. bq. To transfer leadership in Raft, the prior leader sends its log entries to the target server, then the bq. target server runs an election without waiting for an election timeout to elapse. The prior leader bq. thus ensures that the target server has all committed entries at the start of its term, and, as in normal bq. elections, the majority voting guarantees the safety properties (such as the Leader Completeness bq. Property) are maintained. The following steps describe the process in more detail: bq. 1. The prior leader stops accepting new client requests. bq. 2. The prior leader fully updates the target server’s log to match its own, using the normal log bq. replication mechanism described in Section 3.5. bq. 3. The prior leader sends a TimeoutNow request to the target server. This request has the same bq. effect as the target server’s election timer firing: the target server starts a new election (incrementing bq. its term and becoming a candidate). bq. Once the target server receives the TimeoutNow request, it is highly likely to start an election before bq. any other server and become leader in the next term. Its next message to the prior leader will include bq. its new term number, causing the prior leader to step down. At this point, leadership transfer is bq. complete. bq. It is also possible for the target server to fail; in this case, the cluster must resume client operations. bq. If leadership transfer does not complete after about an election timeout, the prior leader aborts bq. the transfer and resumes accepting client requests. If the prior leader was mistaken and the target bq. server is actually operational, then at worst this mistake will result in an extra election, after which bq. client operations will be restored. > Make Ratis consume recommended leader host from the pipeline creator > -------------------------------------------------------------------- > > Key: RATIS-800 > URL: https://issues.apache.org/jira/browse/RATIS-800 > Project: Ratis > Issue Type: Sub-task > Reporter: Li Cheng > Assignee: runzhiwang > Priority: Critical > Attachments: image-2020-07-10-18-27-01-890.png > > > Start a Jira for suggested leader sematics. It would help Ratis performance > if it can consume the leader host which its upstream user like Ozone > recommends. User can choose the leader host based on load balance and rack > awareness. -- This message was sent by Atlassian Jira (v8.3.4#803005)