VladRodionov commented on code in PR #8218:
URL: https://github.com/apache/hbase/pull/8218#discussion_r3220934070
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
Review Comment:
potentialBestWeightedFromFreeCache(...) seems to consider any server with
enough free cache as evidence that the region has a better possible cache
score. Should this be limited to servers that are valid/reasonable candidate
destinations for this specific region, considering locality/rack/table
skew/other constraints?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -167,6 +192,23 @@ public void updateClusterMetrics(ClusterMetrics
clusterMetrics) {
updateRegionLoad();
}
+ protected Map<ServerName, Long> getServerBlockCacheFreeBytes() {
Review Comment:
Free cache capacity does not necessarily mean the moved region will become
warm. What mechanism is expected to actually populate the cache after the move
— prefetch, time-based priority, HBASE-30135, or normal client reads?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -61,6 +62,30 @@ public class CacheAwareLoadBalancer extends
StochasticLoadBalancer {
"hbase.master.balancer.stochastic.throttling.cacheRatio";
public static final float CACHE_RATIO_THRESHOLD_DEFAULT = 0.8f;
+ /**
+ * Below this cache ratio on the current host, a move may be considered for
the free-space
+ * heuristic.
+ */
+ public static final String LOW_CACHE_RATIO_FOR_RELOCATION_KEY =
+ "hbase.master.balancer.cacheaware.lowCacheRatioThreshold";
+ public static final float LOW_CACHE_RATIO_FOR_RELOCATION_DEFAULT = 0.35f;
+
+ /**
+ * Optimistic region cache ratio assumed for cost purposes when a better
host has free cache space
+ * (actual warmup is not modeled).
+ */
+ public static final String POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY =
+ "hbase.master.balancer.cacheaware.potentialCacheRatioAfterMove";
+ public static final float POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT = 0.95f;
Review Comment:
Are the defaults for potentialCacheRatioAfterMove (0.95) and
minFreeCacheSpaceFactor (1.0) based on observed data? They seem important
enough to justify in comments or tests.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -552,25 +605,87 @@ static class CacheAwareCostFunction extends CostFunction {
!isPersistentCache ? 0.0f : conf.getFloat(CACHE_COST_KEY,
DEFAULT_CACHE_COST));
bestCacheRatio = 0.0;
cacheRatio = 0.0;
+ lowCacheRatioThreshold =
+ conf.getFloat(LOW_CACHE_RATIO_FOR_RELOCATION_KEY,
LOW_CACHE_RATIO_FOR_RELOCATION_DEFAULT);
+ potentialCacheRatioAfterMove = Math.min(1.0f, conf
+ .getFloat(POTENTIAL_CACHE_RATIO_AFTER_MOVE_KEY,
POTENTIAL_CACHE_RATIO_AFTER_MOVE_DEFAULT));
+ minFreeCacheSpaceFactor =
+ conf.getFloat(MIN_FREE_CACHE_SPACE_FACTOR_KEY,
MIN_FREE_CACHE_SPACE_FACTOR_DEFAULT);
}
@Override
void prepare(BalancerClusterState cluster) {
super.prepare(cluster);
- cacheRatio = 0.0;
- bestCacheRatio = 0.0;
+ recomputeCacheRatio(cluster);
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("CacheAwareCostFunction: Cost: {}", 1 - cacheRatio);
+ }
+ }
+ private void recomputeCacheRatio(BalancerClusterState cluster) {
+ double[] currentWeighted = computeCurrentWeightedContributions(cluster);
+ double currentSum = 0.0;
+ double bestCacheSum = 0.0;
for (int region = 0; region < cluster.numRegions; region++) {
- cacheRatio += cluster.getOrComputeWeightedRegionCacheRatio(region,
- cluster.regionIndexToServerIndex[region]);
- bestCacheRatio += cluster.getOrComputeWeightedRegionCacheRatio(region,
- getServerWithBestCacheRatioForRegion(region));
+ currentSum += currentWeighted[region];
+ // here we only get the server index where this region cache ratio is
the highest
+ int serverIndexBestCache =
cluster.getOrComputeServerWithBestRegionCachedRatio()[region];
+ double currentHighestCache =
+ cluster.getOrComputeWeightedRegionCacheRatio(region,
serverIndexBestCache);
+ // Get a hypothetical best cache ratio for this region if any server
has enough free cache
+ // to host it.
+ double potentialHighestCache =
+ potentialBestWeightedFromFreeCache(cluster, region,
currentHighestCache);
+ double actualHighest = Math.max(currentHighestCache,
potentialHighestCache);
+ bestCacheSum += actualHighest;
}
+ bestCacheRatio = bestCacheSum;
+ if (bestCacheSum <= 0.0) {
+ cacheRatio = cluster.numRegions == 0 ? 1.0 : 0.0;
+ } else {
+ cacheRatio = Math.min(1.0, currentSum / bestCacheSum);
+ }
+ }
- cacheRatio = bestCacheRatio == 0 ? 1.0 : cacheRatio / bestCacheRatio;
- if (LOG.isDebugEnabled()) {
- LOG.debug("CacheAwareCostFunction: Cost: {}", 1 - cacheRatio);
+ private double[] computeCurrentWeightedContributions(BalancerClusterState
cluster) {
+ int totalRegions = cluster.numRegions;
+ double[] contrib = new double[totalRegions];
+ for (int r = 0; r < totalRegions; r++) {
+ int s = cluster.regionIndexToServerIndex[r];
+ int sizeMb = cluster.getTotalRegionHFileSizeMB(r);
+ if (sizeMb <= 0) {
+ contrib[r] = 0.0;
+ continue;
+ }
+ contrib[r] = cluster.getOrComputeWeightedRegionCacheRatio(r, s);
+ }
+ return contrib;
+ }
+
+ /*
+ * If this region is cold in metrics and at least one RS (including its
current host) reports
+ * enough free block cache to hold it, return an optimistic weighted cache
score ({@link
+ * #potentialCacheRatioAfterMove} * region MB) so placement is not
considered optimal solely
+ * from low ratios when capacity exists somewhere in the cluster.
+ */
+ private double potentialBestWeightedFromFreeCache(BalancerClusterState
cluster, int region,
Review Comment:
The required free cache appears to be based on total region HFile size.
Should this instead use cacheable/hot data size?
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/ServerMetrics.java:
##########
@@ -106,4 +107,21 @@ default String getVersion() {
* rounded to MB
*/
Map<String, Integer> getRegionCachedInfo();
+
+ /**
+ * The available cache space on this region server (bytes), if reported in
the server load.
+ */
+ default long getCacheFreeSize() {
Review Comment:
ServerMetrics#getCacheFreeSize() defaults to 0L, and getRegionColdDataSize()
defaults to empty map. That is safe for compatibility, but it means missing
metrics are indistinguishable from “no free cache” or “no cold data.” Should
missing cache-free-size be represented differently from actual zero? Would 0L
make old / partially upgraded servers look like they have no free cache and
bias balancing?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/CacheAwareLoadBalancer.java:
##########
@@ -207,8 +249,16 @@ private void updateRegionLoad() {
if (!ServerName.isSameAddress(currentServer, sn)) {
int regionSizeMB =
regionCacheRatioOnCurrentServerMap.get(regionEncodedName).getSecond();
+ // The coldDataSize accounts for data size classified as "cold" by
DataTieringManager,
+ // which should be kept out of cache. We sum cold region size in
the cache ratio, as we
+ // don't want to move regions with low cache ratio due to data
classified as cold.
float regionCacheRatioOnOldServer =
- regionSizeMB == 0 ? 0.0f : (float) regionSizeInCache /
regionSizeMB;
+ regionSizeMB
Review Comment:
I think adding cold data size to the numerator makes the value less clear.
It turns the metric into an “effective satisfaction ratio” rather than an
actual cache ratio.
Would it be better to exclude cold data from the denominator instead?
For example:
cacheRatio = cachedDataSize / (totalRegionSize - coldDataSize)
That would measure how much of the cache-relevant/hot data is actually
cached. Cold data is intentionally not expected to be in block cache, so it
should not make the region look poorly cached, but it also should not be
counted as cached.
We may also need to clamp the result to [0, 1] and handle the case where
totalRegionSize <= coldDataSize.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]