[ https://issues.apache.org/jira/browse/HBASE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Heng Chen updated HBASE-16287: ------------------------------ Resolution: Fixed Hadoop Flags: Reviewed Release Note: In order to avoid blockcache size exceed acceptable size too much, we add one configuration "hbase.lru.blockcache.hard.capacity.limit.factor" to decide whether the block could be put into LruBlockCache or not. If blockcache size >= factor*acceptableSize, we will reject the block into cache. Status: Resolved (was: Patch Available) > LruBlockCache size should not exceed acceptableSize too many > ------------------------------------------------------------ > > Key: HBASE-16287 > URL: https://issues.apache.org/jira/browse/HBASE-16287 > Project: HBase > Issue Type: Improvement > Components: BlockCache > Reporter: Yu Sun > Assignee: Yu Sun > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.3 > > Attachments: HBASE-16287-v1.patch, HBASE-16287-v2.patch, > HBASE-16287-v3.patch, HBASE-16287-v4.patch, HBASE-16287-v5.patch, > HBASE-16287-v6.patch, HBASE-16287-v7.patch, HBASE-16287-v8.patch, > HBASE-16287-v9.patch > > > Our regionserver has a configuation as bellow: > -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=2 -XX:+UseConcMarkSweepGC > also we only use blockcache,and set hfile.block.cache.size = 0.3 in > hbase_site.xml,so under this configuration, the lru block cache size will > be(32g-1g)*0.3=9.3g. but in some scenarios,some of the rs will occur > continuous FullGC for hours and most importantly, after FullGC most of the > object in old will not be GCed. so we dump the heap and analyse with MAT and > we observed a obvious memory leak in LruBlockCache, which occpy about 16g > memory, then we set set class LruBlockCache log level to TRACE and observed > this in log: > {quote} > 2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor] > hfile.LruBlockCache: totalSize=15.29 GB, freeSize=-5.99 GB, max=9.30 GB, > blockCount=628182, accesses=101799469125, hits=93517800259, hitRatio=91.86%, > , cachingAccesses=99462650031, cachingHits=93468334621, > cachingHitsRatio=93.97%, evictions=238199, evicted=4776350518, > evictedPerRun=20051.93359375{quote} > we can see blockcache size has exceeded acceptableSize too many, which will > cause the FullGC more seriously. > Afterfter some investigations, I found in this function: > {code:borderStyle=solid} > public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean > inMemory, > final boolean cacheDataInL1) { > {code} > No matter the blockcache size has been used, just put the block into it. but > if the evict thread is not fast enough, blockcache size will increament > significantly. > So here I think we should have a check, for example, if the blockcache size > > 1.2 * acceptableSize(), just return and dont put into it until the blockcache > size if under watrmark. if this is reasonable, I can make a small patch for > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)