[ https://issues.apache.org/jira/browse/HBASE-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861029#comment-15861029 ]
Anoop Sam John commented on HBASE-16630: ---------------------------------------- Looks good. Can we get this in now? Ping [~saint....@gmail.com]. > Fragmentation in long running Bucket Cache > ------------------------------------------ > > Key: HBASE-16630 > URL: https://issues.apache.org/jira/browse/HBASE-16630 > Project: HBase > Issue Type: Bug > Components: BucketCache > Affects Versions: 2.0.0, 1.1.6, 1.3.1, 1.2.3 > Reporter: deepankar > Assignee: deepankar > Priority: Critical > Attachments: 16630-v2-suggest.patch, 16630-v3-suggest.patch, > HBASE-16630.patch, HBASE-16630-v2.patch, HBASE-16630-v3.patch > > > As we are running bucket cache for a long time in our system, we are > observing cases where some nodes after some time does not fully utilize the > bucket cache, in some cases it is even worse in the sense they get stuck at a > value < 0.25 % of the bucket cache (DEFAULT_MEMORY_FACTOR as all our tables > are configured in-memory for simplicity sake). > We took a heap dump and analyzed what is happening and saw that is classic > case of fragmentation, current implementation of BucketCache (mainly > BucketAllocator) relies on the logic that fullyFreeBuckets are available for > switching/adjusting cache usage between different bucketSizes . But once a > compaction / bulkload happens and the blocks are evicted from a bucket size , > these are usually evicted from random places of the buckets of a bucketSize > and thus locking the number of buckets associated with a bucketSize and in > the worst case of the fragmentation we have seen some bucketSizes with > occupancy ratio of < 10 % But they dont have any completelyFreeBuckets to > share with the other bucketSize. > Currently the existing eviction logic helps in the cases where cache used is > more the MEMORY_FACTOR or MULTI_FACTOR and once those evictions are also > done, the eviction (freeSpace function) will not evict anything and the cache > utilization will be stuck at that value without any allocations for other > required sizes. > The fix for this we came up with is simple that we do deFragmentation ( > compaction) of the bucketSize and thus increasing the occupancy ratio and > also freeing up the buckets to be fullyFree, this logic itself is not > complicated as the bucketAllocator takes care of packing the blocks in the > buckets, we need evict and re-allocate the blocks for all the BucketSizes > that dont fit the criteria. > I am attaching an initial patch just to give an idea of what we are thinking > and I'll improve it based on the comments from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)