[ https://issues.apache.org/jira/browse/HBASE-25698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337423#comment-17337423 ]
Viraj Jasani commented on HBASE-25698: -------------------------------------- [~anoop.hbase] Thanks for this nice find, this sounds right to me. TinyLfu does not perform retain() for already cached blocks unlike LruBlockCache does this way: {code:java} LruCachedBlock cb = map.computeIfPresent(cacheKey, (key, val) -> { // It will be referenced by RPC path, so increase here. NOTICE: Must do the retain inside // this block. because if retain outside the map#computeIfPresent, the evictBlock may remove // the block and release, then we're retaining a block with refCnt=0 which is disallowed. // see HBASE-22422. val.getBuffer().retain(); return val; }); {code} Also, I am not sure about this particular testing but AFAIK I think [~apurtell] has used TinyLfu many times in his testing. He can confirm further anyways. {quote}hbase.blockcache.use.external is not set true right. Then only we create CombinedBC with L2 as VictimCache for L1 {quote} I was also not aware of this, just looked at the relevant code. {code:java} public InclusiveCombinedBlockCache(FirstLevelBlockCache l1, BlockCache l2) { super(l1,l2); l1.setVictimCache(l2); } {code} I hope what you are suggesting to fix in TinyLfu is somewhat similar to this: {code:java} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java index a0dc30c524..1cb53dc6b6 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/TinyLfuBlockCache.java @@ -170,6 +170,7 @@ public final class TinyLfuBlockCache implements FirstLevelBlockCache { value = victimCache.getBlock(cacheKey, caching, repeat, updateCacheMetrics); if ((value != null) && caching) { if ((value instanceof HFileBlock) && ((HFileBlock) value).isSharedMem()) { + value.retain(); value = HFileBlock.deepCloneOnHeap((HFileBlock) value); } cacheBlock(cacheKey, value); @@ -203,6 +204,7 @@ public final class TinyLfuBlockCache implements FirstLevelBlockCache { @Override public boolean evictBlock(BlockCacheKey cacheKey) { Cacheable value = cache.asMap().remove(cacheKey); + value.release(); return (value != null); } {code} > Persistent IllegalReferenceCountException at scanner open > --------------------------------------------------------- > > Key: HBASE-25698 > URL: https://issues.apache.org/jira/browse/HBASE-25698 > Project: HBase > Issue Type: Bug > Components: HFile, Scanners > Affects Versions: 2.4.2 > Reporter: Andrew Kyle Purtell > Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3 > > > Persistent scanner open failure with offheap read path enabled. > Not sure how it happened. Test scenario was HBase 1 cluster replicating to > HBase 2 cluster. ITBLL as data generator at source, calm policy only. Scanner > open errors on sink HBase 2 cluster later during ITBLL verify phase. Sink > schema settings bloom=ROW encoding=FAST_DIFF compression=NONE. > {noformat} > Caused by: > org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException: > refCnt: 0, decrement: 1 > at > org.apache.hbase.thirdparty.io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74) > at > org.apache.hbase.thirdparty.io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138) > at > org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) > at org.apache.hadoop.hbase.nio.ByteBuff.release(ByteBuff.java:79) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.release(HFileBlock.java:429) > at > org.apache.hadoop.hbase.io.hfile.CompoundBloomFilter.contains(CompoundBloomFilter.java:109) > at > org.apache.hadoop.hbase.regionserver.StoreFileReader.checkGeneralBloomFilter(StoreFileReader.java:433) > at > org.apache.hadoop.hbase.regionserver.StoreFileReader.passesGeneralRowBloomFilter(StoreFileReader.java:322) > at > org.apache.hadoop.hbase.regionserver.StoreFileReader.passesBloomFilter(StoreFileReader.java:251) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:491) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:471) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:249) > at > org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2177) > at > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2168) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:7172) > {noformat} > Bloom filter type on all files here is ROW, block encoding is FAST_DIFF: > {noformat} > hbase:017:0> describe "IntegrationTestBigLinkedList" > Table IntegrationTestBigLinkedList is ENABLED > > IntegrationTestBigLinkedList > > COLUMN FAMILIES DESCRIPTION > > {NAME => 'big', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', > KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIF > F', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE > => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'} > {NAME => 'meta', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', > KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DI > FF', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE > => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'} > {NAME => 'tiny', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', > KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DI > FF', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE > => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'} > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)