[GitHub] [hbase] pustota2009 commented on a change in pull request #2934: HBASE-23887 AdaptiveLRU cache

GitBox Tue, 09 Feb 2021 12:16:22 -0800


pustota2009 commented on a change in pull request #2934:
URL: https://github.com/apache/hbase/pull/2934#discussion_r573206052




##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruAdaptiveBlockCache.java
##########
@@ -0,0 +1,1442 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.io.hfile;
+
+import static java.util.Objects.requireNonNull;
+
+import java.lang.ref.WeakReference;
+import java.util.EnumMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.PriorityQueue;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.io.HeapSize;
+import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
+import org.apache.hadoop.hbase.util.ClassSize;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.com.google.common.base.MoreObjects;
+import org.apache.hbase.thirdparty.com.google.common.base.Objects;
+import 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * <b>This realisation improve performance of classical LRU
+ * cache up to 3 times via reduce GC job.</b>
+ * </p>
+ * The classical block cache implementation that is memory-aware using {@link 
HeapSize},
+ * memory-bound using an
+ * LRU eviction algorithm, and concurrent: backed by a {@link 
ConcurrentHashMap} and with a
+ * non-blocking eviction thread giving constant-time {@link #cacheBlock} and 
{@link #getBlock}
+ * operations.
+ * </p>
+ * Contains three levels of block priority to allow for scan-resistance and 
in-memory families
+ * {@link 
org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder#setInMemory(boolean)}
 (An
+ * in-memory column family is a column family that should be served from 
memory if possible):
+ * single-access, multiple-accesses, and in-memory priority. A block is added 
with an in-memory
+ * priority flag if {@link 
org.apache.hadoop.hbase.client.ColumnFamilyDescriptor#isInMemory()},
+ * otherwise a block becomes a single access priority the first time it is 
read into this block
+ * cache. If a block is accessed again while in cache, it is marked as a 
multiple access priority
+ * block. This delineation of blocks is used to prevent scans from thrashing 
the cache adding a
+ * least-frequently-used element to the eviction algorithm.
+ * <p/>
+ * Each priority is given its own chunk of the total cache to ensure fairness 
during eviction. Each
+ * priority will retain close to its maximum size, however, if any priority is 
not using its entire
+ * chunk the others are able to grow beyond their chunk size.
+ * <p/>
+ * Instantiated at a minimum with the total size and average block size. All 
sizes are in bytes. The
+ * block size is not especially important as this cache is fully dynamic in 
its sizing of blocks. It
+ * is only used for pre-allocating data structures and in initial heap 
estimation of the map.
+ * <p/>
+ * The detailed constructor defines the sizes for the three priorities (they 
should total to the
+ * <code>maximum size</code> defined). It also sets the levels that trigger 
and control the eviction
+ * thread.
+ * <p/>
+ * The <code>acceptable size</code> is the cache size level which triggers the 
eviction process to
+ * start. It evicts enough blocks to get the size below the minimum size 
specified.
+ * <p/>
+ * Eviction happens in a separate thread and involves a single full-scan of 
the map. It determines
+ * how many bytes must be freed to reach the minimum size, and then while 
scanning determines the
+ * fewest least-recently-used blocks necessary from each of the three 
priorities (would be 3 times
+ * bytes to free). It then uses the priority chunk sizes to evict fairly 
according to the relative
+ * sizes and usage.
+ * <p/>
+ * Adaptive LRU cache lets speed up performance while we are reading much more 
data than can fit
+ * into BlockCache and it is the cause of a high rate of evictions. This in 
turn leads to heavy
+ * Garbage Collector works. So a lot of blocks put into BlockCache but never 
read, but spending
+ * a lot of CPU resources for cleaning. We could avoid this situation via 
parameters:
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.count.limit</b>  - set how many times we 
have to run the
+ * eviction process that starts to avoid putting data to BlockCache. By 
default it is 0 and it
+ * meats the feature will start at the beginning. But if we have some times 
short reading the same
+ * data and some times long-term reading - we can divide it by this parameter. 
For example we know
+ * that our short reading used to be about 1 minutes, then we have to set the 
parameter about 10
+ * and it will enable the feature only for long time massive reading (after 
~100 seconds). So when
+ * we use short-reading and want all of them in the cache we will have it 
(except for eviction of
+ * course). When we use long-term heavy reading the feature will be enabled 
after some time and
+ * bring better performance.
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.mb.size.limit</b> - set how many bytes in 
10 seconds desirable
+ * putting into BlockCache (and evicted from it). The feature will try to 
reach this value and
+ * maintain it. Don't try to set it too small because it leads to premature 
exit from this mode.
+ * For powerful CPUs (about 20-40 physical cores)  it could be about 400-500 
MB. Average system
+ * (~10 cores) 200-300 MB. Some weak systems (2-5 cores) may be good with 
50-100 MB.
+ * How it works: we set the limit and after each ~10 second calculate how many 
bytes were freed.
+ * Overhead = Freed Bytes Sum (MB) * 100 / Limit (MB) - 100;
+ * For example we set the limit = 500 and were evicted 2000 MB. Overhead is:
+ * 2000 * 100 / 500 - 100 = 300%
+ * The feature is going to reduce a percent caching data blocks and fit 
evicted bytes closer to
+ * 100% (500 MB). Some kind of an auto-scaling.
+ * If freed bytes less then the limit we have got negative overhead.
+ * For example if were freed 200 MB:
+ * 200 * 100 / 500 - 100 = -60%
+ * The feature will increase the percent of caching blocks.
+ * That leads to fit evicted bytes closer to 100% (500 MB).
+ * The current situation we can find out in the log of RegionServer:
+ * BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction counter: 0, 
current caching
+ * DataBlock (%): 100 - means no eviction, 100% blocks is caching
+ * BlockCache evicted (MB): 2000, overhead (%): 300, heavy eviction counter: 
1, current caching
+ * DataBlock (%): 97 - means eviction begin, reduce of caching blocks by 3%.
+ * It help to tune your system and find out what value is better set. Don't 
try to reach 0%
+ * overhead, it is impossible. Quite good 50-100% overhead,
+ * it prevents premature exit from this mode.
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.overhead.coefficient</b> - set how fast 
we want to get the
+ * result. If we know that our reading is heavy for a long time, we don't want 
to wait and can
+ * increase the coefficient and get good performance sooner. But if we aren't 
sure we can do it
+ * slowly and it could prevent premature exit from this mode. So, when the 
coefficient is higher
+ * we can get better performance when heavy reading is stable. But when 
reading is changing we
+ * can adjust to it and set  the coefficient to lower value.
+ * For example, we set the coefficient = 0.01. It means the overhead (see 
above) will be
+ * multiplied by 0.01 and the result is the value of reducing percent caching 
blocks. For example,
+ * if the overhead = 300% and the coefficient = 0.01,
+ * then percent of caching blocks will reduce by 3%.
+ * Similar logic when overhead has got negative value (overshooting).  Maybe 
it is just short-term
+ * fluctuation and we will try to stay in this mode. It helps avoid premature 
exit during
+ * short-term fluctuation. Backpressure has simple logic: more overshooting - 
more caching blocks.
+ * <p/>
+ * Find more information about improvement: 
https://issues.apache.org/jira/browse/HBASE-23887
+ */
+@InterfaceAudience.Private
+public class LruAdaptiveBlockCache implements FirstLevelBlockCache {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(LruAdaptiveBlockCache.class);
+
+  /**
+   * Percentage of total size that eviction will evict until; e.g. if set to 
.8, then we will keep
+   * evicting during an eviction run till the cache size is down to 80% of the 
total.
+   */
+  private static final String LRU_MIN_FACTOR_CONFIG_NAME = 
"hbase.lru.blockcache.min.factor";
+
+  /**
+   * Acceptable size of cache (no evictions if size < acceptable)
+   */
+  private static final String LRU_ACCEPTABLE_FACTOR_CONFIG_NAME =
+    "hbase.lru.blockcache.acceptable.factor";
+
+  /**
+   * Hard capacity limit of cache, will reject any put if size > this * 
acceptable
+   */
+  static final String LRU_HARD_CAPACITY_LIMIT_FACTOR_CONFIG_NAME =
+    "hbase.lru.blockcache.hard.capacity.limit.factor";
+  private static final String LRU_SINGLE_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.single.percentage";
+  private static final String LRU_MULTI_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.multi.percentage";
+  private static final String LRU_MEMORY_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.memory.percentage";
+
+  /**
+   * Configuration key to force data-block always (except in-memory are too 
much)
+   * cached in memory for in-memory hfile, unlike inMemory, which is a 
column-family
+   * configuration, inMemoryForceMode is a cluster-wide configuration
+   */
+  private static final String LRU_IN_MEMORY_FORCE_MODE_CONFIG_NAME =
+    "hbase.lru.rs.inmemoryforcemode";
+
+  /* Default Configuration Parameters*/
+
+  /* Backing Concurrent Map Configuration */
+  static final float DEFAULT_LOAD_FACTOR = 0.75f;
+  static final int DEFAULT_CONCURRENCY_LEVEL = 16;
+
+  /* Eviction thresholds */
+  private static final float DEFAULT_MIN_FACTOR = 0.95f;
+  static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
+
+  /* Priority buckets */
+  private static final float DEFAULT_SINGLE_FACTOR = 0.25f;
+  private static final float DEFAULT_MULTI_FACTOR = 0.50f;
+  private static final float DEFAULT_MEMORY_FACTOR = 0.25f;
+
+  private static final float DEFAULT_HARD_CAPACITY_LIMIT_FACTOR = 1.2f;
+
+  private static final boolean DEFAULT_IN_MEMORY_FORCE_MODE = false;
+
+  /* Statistics thread */
+  private static final int STAT_THREAD_PERIOD = 60 * 5;
+  private static final String LRU_MAX_BLOCK_SIZE = "hbase.lru.max.block.size";
+  private static final long DEFAULT_MAX_BLOCK_SIZE = 16L * 1024L * 1024L;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT
+    = "hbase.lru.cache.heavy.eviction.count.limit";
+  // Default value actually equal to disable feature of increasing performance.
+  // Because 2147483647 is about ~680 years (after that it will start to work)
+  // We can set it to 0-10 and get the profit right now.
+  // (see details https://issues.apache.org/jira/browse/HBASE-23887).
+  private static final int DEFAULT_LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT = 
Integer.MAX_VALUE;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT
+    = "hbase.lru.cache.heavy.eviction.mb.size.limit";
+  private static final long DEFAULT_LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT = 
500;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT
+    = "hbase.lru.cache.heavy.eviction.overhead.coefficient";
+  private static final float 
DEFAULT_LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT = 0.01f;
+
+  /**
+   * Defined the cache map as {@link ConcurrentHashMap} here, because in
+   * {@link LruAdaptiveBlockCache#getBlock}, we need to guarantee the atomicity
+   * of map#computeIfPresent (key, func). Besides, the func method must 
execute exactly once only
+   * when the key is present and under the lock context, otherwise the 
reference count will be
+   * messed up. Notice that the
+   * {@link java.util.concurrent.ConcurrentSkipListMap} can not guarantee that.
+   */
+  private transient final ConcurrentHashMap<BlockCacheKey, LruCachedBlock> map;
+
+  /** Eviction lock (locked when eviction in process) */
+  private transient final ReentrantLock evictionLock = new ReentrantLock(true);
+
+  private final long maxBlockSize;
+
+  /** Volatile boolean to track if we are in an eviction process or not */
+  private volatile boolean evictionInProgress = false;
+
+  /** Eviction thread */
+  private transient final EvictionThread evictionThread;
+
+  /** Statistics thread schedule pool (for heavy debugging, could remove) */
+  private transient final ScheduledExecutorService scheduleThreadPool =
+    Executors.newScheduledThreadPool(1, new ThreadFactoryBuilder()
+      
.setNameFormat("LruAdaptiveBlockCacheStatsExecutor").setDaemon(true).build());
+
+  /** Current size of cache */
+  private final AtomicLong size;
+
+  /** Current size of data blocks */
+  private final LongAdder dataBlockSize;
+
+  /** Current number of cached elements */
+  private final AtomicLong elements;
+
+  /** Current number of cached data block elements */
+  private final LongAdder dataBlockElements;
+
+  /** Cache access count (sequential ID) */
+  private final AtomicLong count;
+
+  /** hard capacity limit */
+  private float hardCapacityLimitFactor;
+
+  /** Cache statistics */
+  private final CacheStats stats;
+
+  /** Maximum allowable size of cache (block put if size > max, evict) */
+  private long maxSize;
+
+  /** Approximate block size */
+  private long blockSize;

Review comment:
       done

##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruAdaptiveBlockCache.java
##########
@@ -0,0 +1,1442 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.io.hfile;
+
+import static java.util.Objects.requireNonNull;
+
+import java.lang.ref.WeakReference;
+import java.util.EnumMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.PriorityQueue;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.atomic.LongAdder;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.io.HeapSize;
+import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
+import org.apache.hadoop.hbase.util.ClassSize;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.com.google.common.base.MoreObjects;
+import org.apache.hbase.thirdparty.com.google.common.base.Objects;
+import 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+
+/**
+ * <b>This realisation improve performance of classical LRU
+ * cache up to 3 times via reduce GC job.</b>
+ * </p>
+ * The classical block cache implementation that is memory-aware using {@link 
HeapSize},
+ * memory-bound using an
+ * LRU eviction algorithm, and concurrent: backed by a {@link 
ConcurrentHashMap} and with a
+ * non-blocking eviction thread giving constant-time {@link #cacheBlock} and 
{@link #getBlock}
+ * operations.
+ * </p>
+ * Contains three levels of block priority to allow for scan-resistance and 
in-memory families
+ * {@link 
org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder#setInMemory(boolean)}
 (An
+ * in-memory column family is a column family that should be served from 
memory if possible):
+ * single-access, multiple-accesses, and in-memory priority. A block is added 
with an in-memory
+ * priority flag if {@link 
org.apache.hadoop.hbase.client.ColumnFamilyDescriptor#isInMemory()},
+ * otherwise a block becomes a single access priority the first time it is 
read into this block
+ * cache. If a block is accessed again while in cache, it is marked as a 
multiple access priority
+ * block. This delineation of blocks is used to prevent scans from thrashing 
the cache adding a
+ * least-frequently-used element to the eviction algorithm.
+ * <p/>
+ * Each priority is given its own chunk of the total cache to ensure fairness 
during eviction. Each
+ * priority will retain close to its maximum size, however, if any priority is 
not using its entire
+ * chunk the others are able to grow beyond their chunk size.
+ * <p/>
+ * Instantiated at a minimum with the total size and average block size. All 
sizes are in bytes. The
+ * block size is not especially important as this cache is fully dynamic in 
its sizing of blocks. It
+ * is only used for pre-allocating data structures and in initial heap 
estimation of the map.
+ * <p/>
+ * The detailed constructor defines the sizes for the three priorities (they 
should total to the
+ * <code>maximum size</code> defined). It also sets the levels that trigger 
and control the eviction
+ * thread.
+ * <p/>
+ * The <code>acceptable size</code> is the cache size level which triggers the 
eviction process to
+ * start. It evicts enough blocks to get the size below the minimum size 
specified.
+ * <p/>
+ * Eviction happens in a separate thread and involves a single full-scan of 
the map. It determines
+ * how many bytes must be freed to reach the minimum size, and then while 
scanning determines the
+ * fewest least-recently-used blocks necessary from each of the three 
priorities (would be 3 times
+ * bytes to free). It then uses the priority chunk sizes to evict fairly 
according to the relative
+ * sizes and usage.
+ * <p/>
+ * Adaptive LRU cache lets speed up performance while we are reading much more 
data than can fit
+ * into BlockCache and it is the cause of a high rate of evictions. This in 
turn leads to heavy
+ * Garbage Collector works. So a lot of blocks put into BlockCache but never 
read, but spending
+ * a lot of CPU resources for cleaning. We could avoid this situation via 
parameters:
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.count.limit</b>  - set how many times we 
have to run the
+ * eviction process that starts to avoid putting data to BlockCache. By 
default it is 0 and it
+ * meats the feature will start at the beginning. But if we have some times 
short reading the same
+ * data and some times long-term reading - we can divide it by this parameter. 
For example we know
+ * that our short reading used to be about 1 minutes, then we have to set the 
parameter about 10
+ * and it will enable the feature only for long time massive reading (after 
~100 seconds). So when
+ * we use short-reading and want all of them in the cache we will have it 
(except for eviction of
+ * course). When we use long-term heavy reading the feature will be enabled 
after some time and
+ * bring better performance.
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.mb.size.limit</b> - set how many bytes in 
10 seconds desirable
+ * putting into BlockCache (and evicted from it). The feature will try to 
reach this value and
+ * maintain it. Don't try to set it too small because it leads to premature 
exit from this mode.
+ * For powerful CPUs (about 20-40 physical cores)  it could be about 400-500 
MB. Average system
+ * (~10 cores) 200-300 MB. Some weak systems (2-5 cores) may be good with 
50-100 MB.
+ * How it works: we set the limit and after each ~10 second calculate how many 
bytes were freed.
+ * Overhead = Freed Bytes Sum (MB) * 100 / Limit (MB) - 100;
+ * For example we set the limit = 500 and were evicted 2000 MB. Overhead is:
+ * 2000 * 100 / 500 - 100 = 300%
+ * The feature is going to reduce a percent caching data blocks and fit 
evicted bytes closer to
+ * 100% (500 MB). Some kind of an auto-scaling.
+ * If freed bytes less then the limit we have got negative overhead.
+ * For example if were freed 200 MB:
+ * 200 * 100 / 500 - 100 = -60%
+ * The feature will increase the percent of caching blocks.
+ * That leads to fit evicted bytes closer to 100% (500 MB).
+ * The current situation we can find out in the log of RegionServer:
+ * BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction counter: 0, 
current caching
+ * DataBlock (%): 100 - means no eviction, 100% blocks is caching
+ * BlockCache evicted (MB): 2000, overhead (%): 300, heavy eviction counter: 
1, current caching
+ * DataBlock (%): 97 - means eviction begin, reduce of caching blocks by 3%.
+ * It help to tune your system and find out what value is better set. Don't 
try to reach 0%
+ * overhead, it is impossible. Quite good 50-100% overhead,
+ * it prevents premature exit from this mode.
+ * <p/>
+ * <b>hbase.lru.cache.heavy.eviction.overhead.coefficient</b> - set how fast 
we want to get the
+ * result. If we know that our reading is heavy for a long time, we don't want 
to wait and can
+ * increase the coefficient and get good performance sooner. But if we aren't 
sure we can do it
+ * slowly and it could prevent premature exit from this mode. So, when the 
coefficient is higher
+ * we can get better performance when heavy reading is stable. But when 
reading is changing we
+ * can adjust to it and set  the coefficient to lower value.
+ * For example, we set the coefficient = 0.01. It means the overhead (see 
above) will be
+ * multiplied by 0.01 and the result is the value of reducing percent caching 
blocks. For example,
+ * if the overhead = 300% and the coefficient = 0.01,
+ * then percent of caching blocks will reduce by 3%.
+ * Similar logic when overhead has got negative value (overshooting).  Maybe 
it is just short-term
+ * fluctuation and we will try to stay in this mode. It helps avoid premature 
exit during
+ * short-term fluctuation. Backpressure has simple logic: more overshooting - 
more caching blocks.
+ * <p/>
+ * Find more information about improvement: 
https://issues.apache.org/jira/browse/HBASE-23887
+ */
+@InterfaceAudience.Private
+public class LruAdaptiveBlockCache implements FirstLevelBlockCache {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(LruAdaptiveBlockCache.class);
+
+  /**
+   * Percentage of total size that eviction will evict until; e.g. if set to 
.8, then we will keep
+   * evicting during an eviction run till the cache size is down to 80% of the 
total.
+   */
+  private static final String LRU_MIN_FACTOR_CONFIG_NAME = 
"hbase.lru.blockcache.min.factor";
+
+  /**
+   * Acceptable size of cache (no evictions if size < acceptable)
+   */
+  private static final String LRU_ACCEPTABLE_FACTOR_CONFIG_NAME =
+    "hbase.lru.blockcache.acceptable.factor";
+
+  /**
+   * Hard capacity limit of cache, will reject any put if size > this * 
acceptable
+   */
+  static final String LRU_HARD_CAPACITY_LIMIT_FACTOR_CONFIG_NAME =
+    "hbase.lru.blockcache.hard.capacity.limit.factor";
+  private static final String LRU_SINGLE_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.single.percentage";
+  private static final String LRU_MULTI_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.multi.percentage";
+  private static final String LRU_MEMORY_PERCENTAGE_CONFIG_NAME =
+    "hbase.lru.blockcache.memory.percentage";
+
+  /**
+   * Configuration key to force data-block always (except in-memory are too 
much)
+   * cached in memory for in-memory hfile, unlike inMemory, which is a 
column-family
+   * configuration, inMemoryForceMode is a cluster-wide configuration
+   */
+  private static final String LRU_IN_MEMORY_FORCE_MODE_CONFIG_NAME =
+    "hbase.lru.rs.inmemoryforcemode";
+
+  /* Default Configuration Parameters*/
+
+  /* Backing Concurrent Map Configuration */
+  static final float DEFAULT_LOAD_FACTOR = 0.75f;
+  static final int DEFAULT_CONCURRENCY_LEVEL = 16;
+
+  /* Eviction thresholds */
+  private static final float DEFAULT_MIN_FACTOR = 0.95f;
+  static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
+
+  /* Priority buckets */
+  private static final float DEFAULT_SINGLE_FACTOR = 0.25f;
+  private static final float DEFAULT_MULTI_FACTOR = 0.50f;
+  private static final float DEFAULT_MEMORY_FACTOR = 0.25f;
+
+  private static final float DEFAULT_HARD_CAPACITY_LIMIT_FACTOR = 1.2f;
+
+  private static final boolean DEFAULT_IN_MEMORY_FORCE_MODE = false;
+
+  /* Statistics thread */
+  private static final int STAT_THREAD_PERIOD = 60 * 5;
+  private static final String LRU_MAX_BLOCK_SIZE = "hbase.lru.max.block.size";
+  private static final long DEFAULT_MAX_BLOCK_SIZE = 16L * 1024L * 1024L;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT
+    = "hbase.lru.cache.heavy.eviction.count.limit";
+  // Default value actually equal to disable feature of increasing performance.
+  // Because 2147483647 is about ~680 years (after that it will start to work)
+  // We can set it to 0-10 and get the profit right now.
+  // (see details https://issues.apache.org/jira/browse/HBASE-23887).
+  private static final int DEFAULT_LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT = 
Integer.MAX_VALUE;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT
+    = "hbase.lru.cache.heavy.eviction.mb.size.limit";
+  private static final long DEFAULT_LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT = 
500;
+
+  private static final String LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT
+    = "hbase.lru.cache.heavy.eviction.overhead.coefficient";
+  private static final float 
DEFAULT_LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT = 0.01f;
+
+  /**
+   * Defined the cache map as {@link ConcurrentHashMap} here, because in
+   * {@link LruAdaptiveBlockCache#getBlock}, we need to guarantee the atomicity
+   * of map#computeIfPresent (key, func). Besides, the func method must 
execute exactly once only
+   * when the key is present and under the lock context, otherwise the 
reference count will be
+   * messed up. Notice that the
+   * {@link java.util.concurrent.ConcurrentSkipListMap} can not guarantee that.
+   */
+  private transient final ConcurrentHashMap<BlockCacheKey, LruCachedBlock> map;
+
+  /** Eviction lock (locked when eviction in process) */
+  private transient final ReentrantLock evictionLock = new ReentrantLock(true);
+
+  private final long maxBlockSize;
+
+  /** Volatile boolean to track if we are in an eviction process or not */
+  private volatile boolean evictionInProgress = false;
+
+  /** Eviction thread */
+  private transient final EvictionThread evictionThread;
+
+  /** Statistics thread schedule pool (for heavy debugging, could remove) */
+  private transient final ScheduledExecutorService scheduleThreadPool =
+    Executors.newScheduledThreadPool(1, new ThreadFactoryBuilder()
+      
.setNameFormat("LruAdaptiveBlockCacheStatsExecutor").setDaemon(true).build());
+
+  /** Current size of cache */
+  private final AtomicLong size;
+
+  /** Current size of data blocks */
+  private final LongAdder dataBlockSize;
+
+  /** Current number of cached elements */
+  private final AtomicLong elements;
+
+  /** Current number of cached data block elements */
+  private final LongAdder dataBlockElements;
+
+  /** Cache access count (sequential ID) */
+  private final AtomicLong count;
+
+  /** hard capacity limit */
+  private float hardCapacityLimitFactor;
+
+  /** Cache statistics */
+  private final CacheStats stats;
+
+  /** Maximum allowable size of cache (block put if size > max, evict) */
+  private long maxSize;
+
+  /** Approximate block size */
+  private long blockSize;
+
+  /** Acceptable size of cache (no evictions if size < acceptable) */
+  private float acceptableFactor;
+
+  /** Minimum threshold of cache (when evicting, evict until size < min) */
+  private float minFactor;
+
+  /** Single access bucket size */
+  private float singleFactor;
+
+  /** Multiple access bucket size */
+  private float multiFactor;
+
+  /** In-memory bucket size */
+  private float memoryFactor;
+
+  /** Overhead of the structure itself */
+  private long overhead;
+
+  /** Whether in-memory hfile's data block has higher priority when evicting */
+  private boolean forceInMemory;
+
+  /**
+   * Where to send victims (blocks evicted/missing from the cache). This is 
used only when we use an
+   * external cache as L2.
+   * Note: See org.apache.hadoop.hbase.io.hfile.MemcachedBlockCache
+   */
+  private transient BlockCache victimHandler = null;
+
+  /** Percent of cached data blocks */
+  private volatile int cacheDataBlockPercent;
+
+  /** Limit of count eviction process when start to avoid to cache blocks */
+  private final int heavyEvictionCountLimit;
+
+  /** Limit of volume eviction process when start to avoid to cache blocks */
+  private final long heavyEvictionMbSizeLimit;
+
+  /** Adjust auto-scaling via overhead of evition rate */
+  private final float heavyEvictionOverheadCoefficient;
+
+  /**
+   * Default constructor.  Specify maximum size and expected average block
+   * size (approximation is fine).
+   *
+   * <p>All other factors will be calculated based on defaults specified in
+   * this class.
+   *
+   * @param maxSize   maximum size of cache, in bytes
+   * @param blockSize approximate size of each block, in bytes
+   */
+  public LruAdaptiveBlockCache(long maxSize, long blockSize) {
+    this(maxSize, blockSize, true);
+  }
+
+  /**
+   * Constructor used for testing.  Allows disabling of the eviction thread.
+   */
+  public LruAdaptiveBlockCache(long maxSize, long blockSize, boolean 
evictionThread) {
+    this(maxSize, blockSize, evictionThread,
+      (int) Math.ceil(1.2 * maxSize / blockSize),
+      DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL,
+      DEFAULT_MIN_FACTOR, DEFAULT_ACCEPTABLE_FACTOR,
+      DEFAULT_SINGLE_FACTOR,
+      DEFAULT_MULTI_FACTOR,
+      DEFAULT_MEMORY_FACTOR,
+      DEFAULT_HARD_CAPACITY_LIMIT_FACTOR,
+      false,
+      DEFAULT_MAX_BLOCK_SIZE,
+      DEFAULT_LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT,
+      DEFAULT_LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT,
+      DEFAULT_LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT);
+  }
+
+  public LruAdaptiveBlockCache(long maxSize, long blockSize,
+    boolean evictionThread, Configuration conf) {
+    this(maxSize, blockSize, evictionThread,
+      (int) Math.ceil(1.2 * maxSize / blockSize),
+      DEFAULT_LOAD_FACTOR,
+      DEFAULT_CONCURRENCY_LEVEL,
+      conf.getFloat(LRU_MIN_FACTOR_CONFIG_NAME, DEFAULT_MIN_FACTOR),
+      conf.getFloat(LRU_ACCEPTABLE_FACTOR_CONFIG_NAME, 
DEFAULT_ACCEPTABLE_FACTOR),
+      conf.getFloat(LRU_SINGLE_PERCENTAGE_CONFIG_NAME, DEFAULT_SINGLE_FACTOR),
+      conf.getFloat(LRU_MULTI_PERCENTAGE_CONFIG_NAME, DEFAULT_MULTI_FACTOR),
+      conf.getFloat(LRU_MEMORY_PERCENTAGE_CONFIG_NAME, DEFAULT_MEMORY_FACTOR),
+      conf.getFloat(LRU_HARD_CAPACITY_LIMIT_FACTOR_CONFIG_NAME,
+        DEFAULT_HARD_CAPACITY_LIMIT_FACTOR),
+      conf.getBoolean(LRU_IN_MEMORY_FORCE_MODE_CONFIG_NAME, 
DEFAULT_IN_MEMORY_FORCE_MODE),
+      conf.getLong(LRU_MAX_BLOCK_SIZE, DEFAULT_MAX_BLOCK_SIZE),
+      conf.getInt(LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT,
+        DEFAULT_LRU_CACHE_HEAVY_EVICTION_COUNT_LIMIT),
+      conf.getLong(LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT,
+        DEFAULT_LRU_CACHE_HEAVY_EVICTION_MB_SIZE_LIMIT),
+      conf.getFloat(LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT,
+        DEFAULT_LRU_CACHE_HEAVY_EVICTION_OVERHEAD_COEFFICIENT));
+  }
+
+  public LruAdaptiveBlockCache(long maxSize, long blockSize, Configuration 
conf) {
+    this(maxSize, blockSize, true, conf);
+  }
+
+  /**
+   * Configurable constructor.  Use this constructor if not using defaults.
+   *
+   * @param maxSize             maximum size of this cache, in bytes
+   * @param blockSize           expected average size of blocks, in bytes
+   * @param evictionThread      whether to run evictions in a bg thread or not
+   * @param mapInitialSize      initial size of backing ConcurrentHashMap
+   * @param mapLoadFactor       initial load factor of backing 
ConcurrentHashMap
+   * @param mapConcurrencyLevel initial concurrency factor for backing CHM
+   * @param minFactor           percentage of total size that eviction will 
evict until
+   * @param acceptableFactor    percentage of total size that triggers eviction
+   * @param singleFactor        percentage of total size for single-access 
blocks
+   * @param multiFactor         percentage of total size for multiple-access 
blocks
+   * @param memoryFactor        percentage of total size for in-memory blocks
+   */
+  public LruAdaptiveBlockCache(long maxSize, long blockSize, boolean 
evictionThread,
+    int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel,
+    float minFactor, float acceptableFactor, float singleFactor,
+    float multiFactor, float memoryFactor, float hardLimitFactor,
+    boolean forceInMemory, long maxBlockSize,
+    int heavyEvictionCountLimit, long heavyEvictionMbSizeLimit,
+    float heavyEvictionOverheadCoefficient) {
+    this.maxBlockSize = maxBlockSize;
+    if(singleFactor + multiFactor + memoryFactor != 1 ||
+      singleFactor < 0 || multiFactor < 0 || memoryFactor < 0) {
+      throw new IllegalArgumentException("Single, multi, and memory factors " +
+        " should be non-negative and total 1.0");
+    }
+    if (minFactor >= acceptableFactor) {
+      throw new IllegalArgumentException("minFactor must be smaller than 
acceptableFactor");
+    }
+    if (minFactor >= 1.0f || acceptableFactor >= 1.0f) {
+      throw new IllegalArgumentException("all factors must be < 1");
+    }
+    this.maxSize = maxSize;
+    this.blockSize = blockSize;
+    this.forceInMemory = forceInMemory;
+    map = new ConcurrentHashMap<>(mapInitialSize, mapLoadFactor, 
mapConcurrencyLevel);
+    this.minFactor = minFactor;
+    this.acceptableFactor = acceptableFactor;
+    this.singleFactor = singleFactor;
+    this.multiFactor = multiFactor;
+    this.memoryFactor = memoryFactor;
+    this.stats = new CacheStats(this.getClass().getSimpleName());
+    this.count = new AtomicLong(0);
+    this.elements = new AtomicLong(0);
+    this.dataBlockElements = new LongAdder();
+    this.dataBlockSize = new LongAdder();
+    this.overhead = calculateOverhead(maxSize, blockSize, mapConcurrencyLevel);
+    this.size = new AtomicLong(this.overhead);
+    this.hardCapacityLimitFactor = hardLimitFactor;
+    if (evictionThread) {
+      this.evictionThread = new EvictionThread(this);
+      this.evictionThread.start(); // FindBugs SC_START_IN_CTOR
+    } else {
+      this.evictionThread = null;
+    }
+
+    // check the bounds
+    this.heavyEvictionCountLimit = heavyEvictionCountLimit < 0 ? 0 : 
heavyEvictionCountLimit;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] pustota2009 commented on a change in pull request #2934: HBASE-23887 AdaptiveLRU cache

Reply via email to