runzhiwang opened a new pull request #1371: URL: https://github.com/apache/hadoop-ozone/pull/1371
## What changes were proposed in this pull request? **What's the problem ?** When enable multi-raft, the leader distribution in datanodes is not balance. In my test, there are 72 datanodes, each datanode engage in 6 pipelines, so there are 144 pipelines. As the image shows, the leader number of the 4 datanodes is 0, 0, 4, 2, it's not balance. Because ratis leader not only accept client request, but also replicate log to 2 followers, and follower only replicate log from leader, so the leader's load is at least 3 times of follower. So we need to balance leader. ![image](https://user-images.githubusercontent.com/51938049/91788208-3cc6a400-ec3e-11ea-9e22-4dd4d30016df.png) **How to improve ?** With the guidance of @szetszwo , [RATIS-967](https://issues.apache.org/jira/browse/RATIS-967) not only support priority in leader election, but also support lower priority leader try to yield leadership to higher priority peer when higher priority peer's log catch up. So in ozone 1. assign the suggested leader with higher priority, and 2 followers with lower priority, then we can achieve leader distribution's balance. 2. record the suggested leader count in DatanodeDetails, when create pipeline, choose the datanode with the smallest suggested leader count as the suggested leader. 3. to avoid we lose the suggested leader count in SCM when restart SCM, we also record it in datanode, when scm restart, datanode will report the suggested leader count to SCM. As the following image shows, there are 72 datanodes, each datanode engage in 6 pipelines, so there are 144 pipelines. The leader count of each datanode is 2, there is no exception, we achieve the leader distribution's balance. ![image](https://user-images.githubusercontent.com/51938049/91788822-c7f46980-ec3f-11ea-87e1-3d7a5fccf181.png) ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2922 ## How was this patch tested? add new ut. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org