[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…

2020-07-13 Thread GitBox


huaxiangsun commented on a change in pull request #2003:
URL: https://github.com/apache/hbase/pull/2003#discussion_r454036559



##
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerRegionReplica.java
##
@@ -49,6 +51,41 @@
   public static final HBaseClassTestRule CLASS_RULE =
   
HBaseClassTestRule.forClass(TestStochasticLoadBalancerRegionReplica.class);
 
+  // Mapping of locality test -> expected locality
+  private float[] expectedLocalities = {1.0f, 0.5f};
+
+  /**
+   * Data set for testLocalityCost:

Review comment:
   The format of input array is required by MockCluster, will see if I can 
add more comments for easy reading.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…

2020-07-13 Thread GitBox


huaxiangsun commented on a change in pull request #2003:
URL: https://github.com/apache/hbase/pull/2003#discussion_r454031932



##
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
##
@@ -174,6 +174,46 @@ public void reloadCachedMappings(List arg0) {
 }
   }
 
+  // This mock allows us to test the LocalityCostFunction
+  public class MockCluster extends BaseLoadBalancer.Cluster {
+
+private int[][] localities = null;   // [region][server] = percent of 
blocks
+private boolean firstRegionAsReplica;
+
+public MockCluster(int[][] regions) {
+  this(regions, false);
+}
+
+public MockCluster(int[][] regions, final boolean firstRegionAsReplica) {

Review comment:
   Yeah, this is just simply moved from the TestStochasticLoadBalancer.java 
(if I am not wrong), cause it is needed for the new test case 
TestStochasticLoadBalancerRegionReplica.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] huaxiangsun commented on a change in pull request #2003: HBASE-24633 Remove data locality and StoreFileCostFunction for replic…

2020-07-13 Thread GitBox


huaxiangsun commented on a change in pull request #2003:
URL: https://github.com/apache/hbase/pull/2003#discussion_r454031142



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##
@@ -1462,8 +1473,14 @@ protected double getCostFromRl(BalancerRegionLoad rl) {
 }
 
 @Override
-protected double getCostFromRl(BalancerRegionLoad rl) {
-  return rl.getStorefileSizeMB();
+protected double getCostFromRl(BalancerRegionLoad rl, boolean 
isPrimaryRegion) {
+  // Do not count replica region's file size, as replica regions serve 
very little
+  // read requests, this may be changed if there are enough data from 
production showing

Review comment:
   As I wrote in the comments, all these factors really impacts system 
performance.  From one of the production clusters' stats, < 0.01% of requests 
goes to replica regions, which means most of regions are cold at Region 
servers. That is the reason I want to remove this factors from balancer. Agreed 
with you that things could be different with others, make it configurable makes 
more sense. If it is ok with you, I want to drop this change from this patch 
and creates a separate issue to track it, probably with a test case as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org