[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…
huaxiangsun commented on a change in pull request #2003: URL: https://github.com/apache/hbase/pull/2003#discussion_r454036559 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerRegionReplica.java ## @@ -49,6 +51,41 @@ public static final HBaseClassTestRule CLASS_RULE = HBaseClassTestRule.forClass(TestStochasticLoadBalancerRegionReplica.class); + // Mapping of locality test -> expected locality + private float[] expectedLocalities = {1.0f, 0.5f}; + + /** + * Data set for testLocalityCost: Review comment: The format of input array is required by MockCluster, will see if I can add more comments for easy reading. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…
huaxiangsun commented on a change in pull request #2003: URL: https://github.com/apache/hbase/pull/2003#discussion_r454031932 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java ## @@ -174,6 +174,46 @@ public void reloadCachedMappings(List arg0) { } } + // This mock allows us to test the LocalityCostFunction + public class MockCluster extends BaseLoadBalancer.Cluster { + +private int[][] localities = null; // [region][server] = percent of blocks +private boolean firstRegionAsReplica; + +public MockCluster(int[][] regions) { + this(regions, false); +} + +public MockCluster(int[][] regions, final boolean firstRegionAsReplica) { Review comment: Yeah, this is just simply moved from the TestStochasticLoadBalancer.java (if I am not wrong), cause it is needed for the new test case TestStochasticLoadBalancerRegionReplica. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…
huaxiangsun commented on a change in pull request #2003: URL: https://github.com/apache/hbase/pull/2003#discussion_r454031142 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java ## @@ -1462,8 +1473,14 @@ protected double getCostFromRl(BalancerRegionLoad rl) { } @Override -protected double getCostFromRl(BalancerRegionLoad rl) { - return rl.getStorefileSizeMB(); +protected double getCostFromRl(BalancerRegionLoad rl, boolean isPrimaryRegion) { + // Do not count replica region's file size, as replica regions serve very little + // read requests, this may be changed if there are enough data from production showing Review comment: As I wrote in the comments, all these factors really impacts system performance. From one of the production clusters' stats, < 0.01% of requests goes to replica regions, which means most of regions are cold at Region servers. That is the reason I want to remove this factors from balancer. Agreed with you that things could be different with others, make it configurable makes more sense. If it is ok with you, I want to drop this change from this patch and creates a separate issue to track it, probably with a test case as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org