busbey commented on a change in pull request #3356:
URL: https://github.com/apache/hbase/pull/3356#discussion_r647684738



##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -555,10 +564,10 @@ private String functionCost() {
     for (CostFunction c : costFunctions) {
       builder.append(c.getClass().getSimpleName());
       builder.append(" : (");
-      if (c.isNeeded()) {
-        builder.append(c.getMultiplier());
+      if (c.isNeeded() || c.getMultiplier() > 0) {
+          builder.append("multiplier=" + c.getMultiplier());

Review comment:
       This smells like an error in how `isNeeded` gets implemented. It should 
be taking into account things like "the multiplier isn't >0". I don't think 
this is an issue with your specific PR since it happens all over the 
`StochasticLoadBalancer` class. Something to clean up later maybe.

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -331,27 +334,28 @@ boolean needsBalance(TableName tableName, 
BalancerClusterState cluster) {
         LOG.trace("{} not needed", c.getClass().getSimpleName());
         continue;
       }
-      sumMultiplier += multiplier;
-      total += c.cost() * multiplier;
+      if (cost < minCostNeedBalance)
+      {
+        LOG.debug("Imbalance of {} on the scale of [0, 1] is {} < threshold 
({}).",
+          c.getClass().getSimpleName(), cost, minCostNeedBalance);
+      }

Review comment:
       What's the goal here? It looks like the trace level output of the 
`functionCost()` call has been removed in favor of debug output here? Will need 
to include the weights and output for cost functions that were over the 
threshold otherwise there'll be no way to reproduce why the balancer decided to 
skip from just the logs.
   

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -470,17 +475,19 @@ private long calculateMaxSteps(BalancerClusterState 
cluster) {
     updateStochasticCosts(tableName, curOverallCost, curFunctionCosts);
     if (initCost > currentCost) {
       List<RegionPlan> plans = createRegionPlans(cluster);
-      LOG.info("Finished computing new load balance plan. Computation took {}" 
+
-        " to try {} different iterations.  Found a solution that moves " +
-        "{} regions; Going from a computed cost of {}" +
-        " to a new cost of {}", java.time.Duration.ofMillis(endTime - 
startTime),
-        step, plans.size(), initCost, currentCost);
+      LOG.info("Finished computing new moving plan. Computation took {} ms" +
+          " to try {} different iterations.  Found a solution that moves " +
+          "{} regions; Going from a computed imbalance of {}" +
+          " to a new imbalance of {}. ",
+        endTime - startTime, step, plans.size(), initCost / sumMultiplier, 
currentCost / sumMultiplier);
       sendRegionPlansToRingBuffer(plans, currentCost, initCost, 
initFunctionTotalCosts, step);
+      LOG.info("Moving plan submitted. Balancer is going into sleep until next 
period");
       return plans;
     }
-    LOG.info("Could not find a better load balance plan.  Tried {} different 
configurations in " +
-      "{}, and did not find anything with a computed cost less than {}", step,
-      java.time.Duration.ofMillis(endTime - startTime), initCost);
+    LOG.info("Could not find a better moving plan.  Tried {} different 
configurations in " +
+        "{} ms, and did not find anything with a computed cost less than {}", 
step,
+      endTime - startTime, initCost);

Review comment:
       Shouldn't this talk about "imbalance" score or some such instead of 
"computed cost"?

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -470,17 +475,19 @@ private long calculateMaxSteps(BalancerClusterState 
cluster) {
     updateStochasticCosts(tableName, curOverallCost, curFunctionCosts);
     if (initCost > currentCost) {
       List<RegionPlan> plans = createRegionPlans(cluster);
-      LOG.info("Finished computing new load balance plan. Computation took {}" 
+
-        " to try {} different iterations.  Found a solution that moves " +
-        "{} regions; Going from a computed cost of {}" +
-        " to a new cost of {}", java.time.Duration.ofMillis(endTime - 
startTime),
-        step, plans.size(), initCost, currentCost);
+      LOG.info("Finished computing new moving plan. Computation took {} ms" +
+          " to try {} different iterations.  Found a solution that moves " +
+          "{} regions; Going from a computed imbalance of {}" +
+          " to a new imbalance of {}. ",
+        endTime - startTime, step, plans.size(), initCost / sumMultiplier, 
currentCost / sumMultiplier);
       sendRegionPlansToRingBuffer(plans, currentCost, initCost, 
initFunctionTotalCosts, step);
+      LOG.info("Moving plan submitted. Balancer is going into sleep until next 
period");

Review comment:
       I like this phrasing for distinguishing that the balancer came up with 
the set of moves to do but did not actually do the moving itself.

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -331,27 +334,28 @@ boolean needsBalance(TableName tableName, 
BalancerClusterState cluster) {
         LOG.trace("{} not needed", c.getClass().getSimpleName());
         continue;
       }
-      sumMultiplier += multiplier;
-      total += c.cost() * multiplier;
+      if (cost < minCostNeedBalance)
+      {
+        LOG.debug("Imbalance of {} on the scale of [0, 1] is {} < threshold 
({}).",
+          c.getClass().getSimpleName(), cost, minCostNeedBalance);
+      }
+      total += cost * multiplier;
     }
 
     boolean balanced = total <= 0 || sumMultiplier <= 0 ||
         (sumMultiplier > 0 && (total / sumMultiplier) < minCostNeedBalance);
     if (balanced) {
       final double calculatedTotal = total;
-      final double calculatedMultiplier = sumMultiplier;
-      sendRejectionReasonToRingBuffer(() -> getBalanceReason(calculatedTotal, 
calculatedMultiplier),
+      sendRejectionReasonToRingBuffer(() -> getBalanceReason(calculatedTotal),
         costFunctions);
     }
-    if (LOG.isDebugEnabled()) {
-      LOG.debug("{} {}; total cost={}, sum multiplier={}; cost/multiplier to 
need a balance is {}",
-          balanced ? "Skipping load balancing because balanced" : "We need to 
load balance",
-          isByTable ? String.format("table (%s)", tableName) : "cluster",
-          total, sumMultiplier, minCostNeedBalance);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("Balance decision detailed function costs={}", 
functionCost());
-      }
-    }
+    LOG.info("{} {} ", isByTable ? String.format("table (%s)", tableName) : 
"cluster",
+      balanced ? "Skipping load balancing because weighted average imbalance= "
+        + total / sumMultiplier + "< threshold (" + minCostNeedBalance + ") on 
the scale of [0, 1]."
+        + "If you want more aggressive balancing, either lower threshold 
minCostNeedbalance ("
+        + minCostNeedBalance + ") or increase the relative weight(s) of the 
specific cost function(s)."
+        : "Calculating plan. May take up to " + maxRunningTime + " ms to 
complete.");
+

Review comment:
       oof. That's a lot of logic packed into string concatenation. Can we 
unwrap this a bit so that future folks don't have to decipher so much?
   
   Something like 
   
   ```
   if (balanced) {
     LOG.info("{} - skipping load balancing because weighted average imbalance 
= {} > threshold ({})....",
         isByTable ? "Table specific ("+tableName+")" : "Cluster wide", 
total/sumMultiplier, minCostNeedBalance,...);
   } else {
     LOG.info("{} - Calculating plan. may take up to {}ms to complete.",
         isByTable ? "Table specific ("+tableName+")" : "Cluster wide", 
maxRunningTime);
   }
   ```
   
   could be rolled into the if block that calls 
`sendRejectionReasonToRingBuffer`?

##########
File path: 
hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
##########
@@ -331,27 +334,28 @@ boolean needsBalance(TableName tableName, 
BalancerClusterState cluster) {
         LOG.trace("{} not needed", c.getClass().getSimpleName());
         continue;
       }
-      sumMultiplier += multiplier;
-      total += c.cost() * multiplier;
+      if (cost < minCostNeedBalance)
+      {
+        LOG.debug("Imbalance of {} on the scale of [0, 1] is {} < threshold 
({}).",
+          c.getClass().getSimpleName(), cost, minCostNeedBalance);
+      }
+      total += cost * multiplier;
     }
 
     boolean balanced = total <= 0 || sumMultiplier <= 0 ||
         (sumMultiplier > 0 && (total / sumMultiplier) < minCostNeedBalance);
     if (balanced) {
       final double calculatedTotal = total;
-      final double calculatedMultiplier = sumMultiplier;
-      sendRejectionReasonToRingBuffer(() -> getBalanceReason(calculatedTotal, 
calculatedMultiplier),
+      sendRejectionReasonToRingBuffer(() -> getBalanceReason(calculatedTotal),
         costFunctions);
     }
-    if (LOG.isDebugEnabled()) {
-      LOG.debug("{} {}; total cost={}, sum multiplier={}; cost/multiplier to 
need a balance is {}",
-          balanced ? "Skipping load balancing because balanced" : "We need to 
load balance",
-          isByTable ? String.format("table (%s)", tableName) : "cluster",
-          total, sumMultiplier, minCostNeedBalance);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("Balance decision detailed function costs={}", 
functionCost());
-      }
-    }
+    LOG.info("{} {} ", isByTable ? String.format("table (%s)", tableName) : 
"cluster",
+      balanced ? "Skipping load balancing because weighted average imbalance= "
+        + total / sumMultiplier + "< threshold (" + minCostNeedBalance + ") on 
the scale of [0, 1]."
+        + "If you want more aggressive balancing, either lower threshold 
minCostNeedbalance ("
+        + minCostNeedBalance + ") or increase the relative weight(s) of the 
specific cost function(s)."

Review comment:
       you refer to `minCostBalance` here but "threshold" above. If we're going 
to tell someone to change the threshold we should refer to a config that they 
can alter. Also which cost functions? How would I know what they are and how do 
I change them?
   
   This feels like a place where we'd be better off providing a pointer to 
something in the ref guide that can go into all this detail instead of putting 
it into the log file repeatedly when the cluster is in steady state.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to