VladRodionov commented on code in PR #8218:
URL: https://github.com/apache/hbase/pull/8218#discussion_r3220934070


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########


Review Comment:
   potentialBestWeightedFromFreeCache(...) seems to consider any server with 
enough free cache as evidence that the region has a better possible cache 
score. Should this be limited to servers that are valid/reasonable candidate 
destinations for this specific region, considering locality/rack/table 
skew/other constraints?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -167,6 +192,23 @@ public void updateClusterMetrics(ClusterMetrics 
clusterMetrics) {
     updateRegionLoad();
   }
 
+  protected Map<ServerName, Long> getServerBlockCacheFreeBytes() {

Review Comment:
   Free cache capacity does not necessarily mean the moved region will become 
warm. What mechanism is expected to actually populate the cache after the move 
— prefetch, time-based priority, HBASE-30135, or normal client reads?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -61,6 +62,30 @@ public class CacheAwareLoadBalancer extends 
StochasticLoadBalancer {
     "hbase.master.balancer.stochastic.throttling.cacheRatio";
   public static final float CACHE_RATIO_THRESHOLD_DEFAULT = 0.8f;
 
+  /**
+   * Below this cache ratio on the current host, a move may be considered for 
the free-space
+   * heuristic.
+   */
+  public static final String LOW_CACHE_RATIO_FOR_RELOCATION_KEY =
+    "hbase.master.balancer.cacheaware.lowCacheRatioThreshold";
+  public static final float LOW_CACHE_RATIO_FOR_RELOCATION_DEFAULT = 0.35f;
+
+  /**
+   * Optimistic region cache ratio assumed for cost purposes when a better 
host has free cache space
+   * (actual warmup is not modeled).
+   */
+  public static final String POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY =
+    "hbase.master.balancer.cacheaware.potentialCacheRatioAfterMove";
+  public static final float POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT = 0.95f;

Review Comment:
   Are the defaults for potentialCacheRatioAfterMove (0.95) and 
minFreeCacheSpaceFactor (1.0) based on observed data? They seem important 
enough to justify in comments or tests.



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -552,25 +605,87 @@ static class CacheAwareCostFunction extends CostFunction {
         !isPersistentCache ? 0.0f : conf.getFloat(CACHE_COST_KEY, 
DEFAULT_CACHE_COST));
       bestCacheRatio = 0.0;
       cacheRatio = 0.0;
+      lowCacheRatioThreshold =
+        conf.getFloat(LOW_CACHE_RATIO_FOR_RELOCATION_KEY, 
LOW_CACHE_RATIO_FOR_RELOCATION_DEFAULT);
+      potentialCacheRatioAfterMove = Math.min(1.0f, conf
+        .getFloat(POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY, 
POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT));
+      minFreeCacheSpaceFactor =
+        conf.getFloat(MIN_FREE_CACHE_SPACE_FACTOR_KEY, 
MIN_FREE_CACHE_SPACE_FACTOR_DEFAULT);
     }
 
     @Override
     void prepare(BalancerClusterState cluster) {
       super.prepare(cluster);
-      cacheRatio = 0.0;
-      bestCacheRatio = 0.0;
+      recomputeCacheRatio(cluster);
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("CacheAwareCostFunction: Cost: {}", 1 - cacheRatio);
+      }
+    }
 
+    private void recomputeCacheRatio(BalancerClusterState cluster) {
+      double[] currentWeighted = computeCurrentWeightedContributions(cluster);
+      double currentSum = 0.0;
+      double bestCacheSum = 0.0;
       for (int region = 0; region < cluster.numRegions; region++) {
-        cacheRatio += cluster.getOrComputeWeightedRegionCacheRatio(region,
-          cluster.regionIndexToServerIndex[region]);
-        bestCacheRatio += cluster.getOrComputeWeightedRegionCacheRatio(region,
-          getServerWithBestCacheRatioForRegion(region));
+        currentSum += currentWeighted[region];
+        // here we only get the server index where this region cache ratio is 
the highest
+        int serverIndexBestCache = 
cluster.getOrComputeServerWithBestRegionCachedRatio()[region];
+        double currentHighestCache =
+          cluster.getOrComputeWeightedRegionCacheRatio(region, 
serverIndexBestCache);
+        // Get a hypothetical best cache ratio for this region if any server 
has enough free cache
+        // to host it.
+        double potentialHighestCache =
+          potentialBestWeightedFromFreeCache(cluster, region, 
currentHighestCache);
+        double actualHighest = Math.max(currentHighestCache, 
potentialHighestCache);
+        bestCacheSum += actualHighest;
       }
+      bestCacheRatio = bestCacheSum;
+      if (bestCacheSum <= 0.0) {
+        cacheRatio = cluster.numRegions == 0 ? 1.0 : 0.0;
+      } else {
+        cacheRatio = Math.min(1.0, currentSum / bestCacheSum);
+      }
+    }
 
-      cacheRatio = bestCacheRatio == 0 ? 1.0 : cacheRatio / bestCacheRatio;
-      if (LOG.isDebugEnabled()) {
-        LOG.debug("CacheAwareCostFunction: Cost: {}", 1 - cacheRatio);
+    private double[] computeCurrentWeightedContributions(BalancerClusterState 
cluster) {
+      int totalRegions = cluster.numRegions;
+      double[] contrib = new double[totalRegions];
+      for (int r = 0; r < totalRegions; r++) {
+        int s = cluster.regionIndexToServerIndex[r];
+        int sizeMb = cluster.getTotalRegionHFileSizeMB(r);
+        if (sizeMb <= 0) {
+          contrib[r] = 0.0;
+          continue;
+        }
+        contrib[r] = cluster.getOrComputeWeightedRegionCacheRatio(r, s);
+      }
+      return contrib;
+    }
+
+    /*
+     * If this region is cold in metrics and at least one RS (including its 
current host) reports
+     * enough free block cache to hold it, return an optimistic weighted cache 
score ({@link
+     * #potentialCacheRatioAfterMove} * region MB) so placement is not 
considered optimal solely
+     * from low ratios when capacity exists somewhere in the cluster.
+     */
+    private double potentialBestWeightedFromFreeCache(BalancerClusterState 
cluster, int region,

Review Comment:
   The required free cache appears to be based on total region HFile size. 
Should this instead use cacheable/hot data size?



##########
hbase-client/src/main/java/org/apache/hadoop/hbase/ServerMetrics.java:
##########
@@ -106,4 +107,21 @@ default String getVersion() {
    *         rounded to MB
    */
   Map<String, Integer> getRegionCachedInfo();
+
+  /**
+   * The available cache space on this region server (bytes), if reported in 
the server load.
+   */
+  default long getCacheFreeSize() {

Review Comment:
   ServerMetrics#getCacheFreeSize() defaults to 0L, and getRegionColdDataSize() 
defaults to empty map.   That is safe for compatibility, but it means missing 
metrics are indistinguishable from “no free cache” or “no cold data.” Should 
missing cache-free-size be represented differently from actual zero? Would 0L 
make old / partially upgraded servers look like they have no free cache and 
bias balancing?



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -207,8 +249,16 @@ private void updateRegionLoad() {
           if (!ServerName.isSameAddress(currentServer, sn)) {
             int regionSizeMB =
               
regionCacheRatioOnCurrentServerMap.get(regionEncodedName).getSecond();
+            // The coldDataSize accounts for data size classified as "cold" by 
DataTieringManager,
+            // which should be kept out of cache. We sum cold region size in 
the cache ratio, as we
+            // don't want to move regions with low cache ratio due to data 
classified as cold.
             float regionCacheRatioOnOldServer =
-              regionSizeMB == 0 ? 0.0f : (float) regionSizeInCache / 
regionSizeMB;
+              regionSizeMB

Review Comment:
   I think adding cold data size to the numerator makes the value less clear. 
It turns the metric into an “effective satisfaction ratio” rather than an 
actual cache ratio.
   
   Would it be better to exclude cold data from the denominator instead?
   
   For example:
   
   cacheRatio = cachedDataSize / (totalRegionSize - coldDataSize)
   
   That would measure how much of the cache-relevant/hot data is actually 
cached. Cold data is intentionally not expected to be in block cache, so it 
should not make the region look poorly cached, but it also should not be 
counted as cached.
   
   We may also need to clamp the result to [0, 1] and handle the case where 
totalRegionSize <= coldDataSize.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to