[ https://issues.apache.org/jira/browse/HBASE-20186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated HBASE-20186: --------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the patch, Xiang. > Improve RSGroupBasedLoadBalancer#balanceCluster() to be more efficient when > calculating cluster state for each rsgroup > ---------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-20186 > URL: https://issues.apache.org/jira/browse/HBASE-20186 > Project: HBase > Issue Type: Improvement > Components: rsgroup > Reporter: Xiang Li > Assignee: Xiang Li > Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20186.master.000.patch, > HBASE-20186.master.001.patch > > > In RSGroupBasedLoadBalancer > {code} > public List<RegionPlan> balanceCluster(Map<ServerName, List<RegionInfo>> > clusterState) > {code} > The second half of the function is to calculate region move plan for regions > which have been already placed according to the rsgroup assignment, and it is > calculated one rsgroup after another. > The following logic to check if a server belongs to the rsgroup is not quite > efficient, as it does not make good use of the fact that servers in > RSGroupInfo is a TreeSet. > {code} > for (Address sName : info.getServers()) { > for(ServerName curr: clusterState.keySet()) { > if(curr.getAddress().equals(sName)) { > groupClusterState.put(curr, correctedState.get(curr)); > } > } > } > {code} > Given there are m region servers in the cluster and n region servers for each > rsgroup in average, the code above has time complexity as O(m * n), while > using TreeSet's contains(), the time complexity could be reduced to O (m * > logn). > Another improvement is we do not need to scan every server for each rsgroup. > If the processed server could be recorded, we could skip those. -- This message was sent by Atlassian JIRA (v7.6.3#76005)