[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041624#comment-16041624 ]
Ted Yu edited comment on HBASE-18164 at 6/7/17 9:13 PM: -------------------------------------------------------- Please wrap long lines (limit is 100 chararcters): {code} + private float[][][] cachedLocalities; // Maps localityType [server|rack] -> region -> [server|rack]Index + private int[][] regionsToMostLocalEntities; // Maps localityType [server|rack] -> [server|rack]Index with highest locality {code} {code} + public int getRegionSizeMb(int region) { ... + return regionLoads[region].getLast().getStorefileSizeMB(); {code} Note the 'MB' in existing method. getRegionSizeMb -> getRegionSizeMB {code} + static class LocalityCostFunction extends LocalityBasedCostFunction { {code} LocalityCostFunction -> ServerLocalityCostFunction {code} Subject: [PATCH 4/7] Added new locality generator {code} Can you generate patch with single commit ? was (Author: yuzhih...@gmail.com): Please wrap long lines (limit is 100 chararcters): {code} + private float[][][] cachedLocalities; // Maps localityType [server|rack] -> region -> [server|rack]Index + private int[][] regionsToMostLocalEntities; // Maps localityType [server|rack] -> [server|rack]Index with highest locality {code} {code} + public int getRegionSizeMb(int region) { ... + return regionLoads[region].getLast().getStorefileSizeMB(); {code} Note the 'MB' in existing method. getRegionSizeMb -> getRegionSizeMB {code} + static class LocalityCostFunction extends LocalityBasedCostFunction { {code} LocalityCostFunction -> ServerLocalityCostFunction Subject: [PATCH 4/7] Added new locality generator Can you generate patch with single commit ? > Much faster locality cost function and candidate generator > ---------------------------------------------------------- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer > Reporter: Kahlil Oppenheimer > Assignee: Kahlil Oppenheimer > Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)