runzhiwang commented on a change in pull request #1371: URL: https://github.com/apache/hadoop-ozone/pull/1371#discussion_r490675374
########## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineProvider.java ########## @@ -98,8 +105,65 @@ private boolean exceedPipelineNumberLimit(ReplicationFactor factor) { return false; } + private Map<DatanodeDetails, Integer> getSuggestedLeaderCount( + List<DatanodeDetails> dns) { + Map<DatanodeDetails, Integer> suggestedLeaderCount = new HashMap<>(); + for (DatanodeDetails dn : dns) { + suggestedLeaderCount.put(dn, 0); + + Set<PipelineID> pipelineIDSet = getNodeManager().getPipelines(dn); + for (PipelineID pipelineID : pipelineIDSet) { + try { + Pipeline pipeline = getPipelineStateManager().getPipeline(pipelineID); + if (!pipeline.isClosed() + && dn.getUuid().equals(pipeline.getSuggestedLeaderId())) { Review comment: @xiaoyuyao Good point, I also have thought this. > Any performance impact on the pipeline of forcing leader to be the original one. If there is performance problem, I can improve forcing leader change within 1 second. I already know how to improve it, but has not implemented it. > Another situation I'm thinking of is writers on pipeline with slow leader(e.g., hardware slowness) may not be able to recover by leader change. We can find slow leader by some metric, decrease the priority of the slow leader, select one faster datanode and increase it's priority, so the faster datanode will grab the leadership from the slow leader. > In the case of S1 temporarily down, why don't we keep P1 leader on S3 and create P3 with leader on S1, this gives more flexibility for higher level to choose leader? I want the cluster leader distribution as we planned, if the plan is not appropriate, we can adjust the plan by change priority. If the leader distribution totally depends on hardware rather than plan, we maybe lost control of the leader distribution. Because the leaderId in scm was reported by datanode, it maybe a delayed leaderId. For example, datanode report: S1 .. S2 .. S3 P1 .. P2 then P1's leader transfer to S3, but SCM has not received this report, SCM allocate P3's leader to S3, then S1 .. S2 .. S3 .......P2 .. P1 ............P3 It's not balance now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org