[ https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502685#comment-13502685 ]
Junping Du commented on HDFS-3495: ---------------------------------- Hi Nicholas, Thanks for your review and your great comments! Please see my reply below: 1.It seems that we are not going to subclass Balancer. If it is the case, we should not change the fields/methods to protected. [JP] Sure. I remove all protected method in Balancer. The exception could be on unit test as it needs a subclass of MiniDFSCluster (MiniDFSClusterWithNodeGroup) which have different configuration in initialization. 2. There are a few places initializing a NetworkTopology instance from conf. How about adding a new getInstance(conf) method to NetworkTopology so that Balancer, DatanodeManager and some tests could call it? [JP] Sure. Updated. 3. The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary. How about using cluster.isOnSameNodeGroup(..) directly? [JP] Agree. it could be better. Updated. 4.If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can be simplified as below: [JP] It is awesome! It makes code much simplified and clear. Thanks! 5. The addTo method could also be used in Source.chooseNextBlockToMove(). [JP] It looks slight different case there. If we use addTo() to replace original code, then we will lose "pendingBlock.source = this;" and add " proxySource = target" which is different with original logic? 6.Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()? [JP] Sure. Two methods are combined. 7. chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same. Similarly, chooseTargetsOnSameNodeGroup and chooseSourcesOnSameNodeGroup looks the same. I spent sometime to combine them. See if you think it is good. [JP] It is super! I put it in new changes and the method of selectCandidateOnSameNodeGroup can be used by other code in Balancer (I rename to matchSourceWithTargetToMove as it is more general). I will update patch soon. > Update Balancer to support new NetworkTopology with NodeGroup > ------------------------------------------------------------- > > Key: HDFS-3495 > URL: https://issues.apache.org/jira/browse/HDFS-3495 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer > Affects Versions: 1.1.0, 2.0.2-alpha > Reporter: Junping Du > Assignee: Junping Du > Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch, > HDFS-3495-v2.patch, HDFS-3495-v3.patch > > > Since the Balancer is a Hadoop Tool, it was updated to be directly aware of > four-layer hierarchy instead of creating an alternative Balancer > implementation. To accommodate extensibility, a new protected method, > doChooseNodesForCustomFaultDomain is now called from the existing chooseNodes > method so that a subclass of the Balancer could customize the balancer > algotirhm for other failure and locality topologies. An alternative option is > to encapsulate the algorithm used for the four-layer hierarchy into a > collaborating strategy class. > The key changes introduced to support a four-layer hierarchy were to override > the algorithm of choosing <source, target> pairs for balancing. Unit tests > were created to test the new algorithm. > The algorithm now makes sure to choose the target and source node on the same > node group for balancing as the first priority. Then the overall balancing > policy is: first doing balancing between nodes within the same nodegroup then > the same rack and off rack at last. Also, we need to check no duplicated > replicas live in the same node group after balancing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira