[jira] [Comment Edited] (HBASE-23887) BlockCache performance improve by reduce eviction rate

Viraj Jasani (Jira) Thu, 07 Jan 2021 21:44:06 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260764#comment-17260764
 ]


Viraj Jasani edited comment on HBASE-23887 at 1/8/21, 5:43 AM:
---------------------------------------------------------------

[~pustota] Let's just make this an extended LruCache rather than making changes 
in LruCache directly (same as suggested earlier by many reviewers).

This is what you can do:
 # In hbase-server, create new class AdaptiveLruBlockCache (package 
org.apache.hadoop.hbase.io.hfile) that would implement FirstLevelBlockCache, 
and keep it InterfaceAudience.Private (just like LruBlockCache).
 # Copy entire code from LruBlockCache to AdaptiveLruBlockCache (just update 
references of LruBlockCache with AdaptiveLruBlockCache)
 # Add Javadoc to AdaptiveLruBlockCache class (provide all details about perf 
improvement, how it is 300% faster, which kind of distribution should choose 
this etc)
 # BlockCacheFactory has method: createFirstLevelCache(), add one more option 
for adaptive blockCache and with some value (say "adaptiveLRU"). In that 
method, value "LRU" would initialize LruBlockCache, value "TinyLFU" would 
initialize TinyLfuBlockCache. Similarly, "adaptiveLRU" should initialize 
AdaptiveLruBlockCache.
 # Make all changes that you have done in current PR#1257 in LruBlockCache to 
AdaptiveLruBlockCache. And keep LruBlockCache unchanged.
 # Provide some documents in dev-support/design-docs, as Sean has mentioned 
above. You can also refer to those docs in Javadoc of AdaptiveLruBlockCache.

 

If you follow this, I don't think you will need to make any changes in 
CombinedBlockCache, InclusiveCombinedBlockCache, CacheConfig.

Let's get your changes in as a new Lru Blockcache at least. If you feel certain 
configurable changes should also be done in LruBlockCache class, we can 
consider it as part of separate Jira.

We do not wish to block your changes. However, since this is changing the way 
we cache (of course it is improved version), it better go as a configurable 
opt-in feature. With changes mentioned in step 4 above, user can choose this 
new FirstLevelBlockCache implementation (an improved LruBlockCache) by 
providing value "adaptiveLRU" to config "hfile.block.cache.policy", and that's 
it.

As an example, please take a look at how TinyLfuBlockCache is implemented and 
how it is instantiated (as configurable cache). It does not require any changes 
in CombinedBlockCache or InclusiveCombinedBlockCache because we are just 
providing a new L1 cache.

Let me know what you think. Thanks for working on this. I know it's been a lot 
of time, let's get your changes in.


was (Author: vjasani):
[~pustota] Let's just make this an extended LruCache rather than making changes 
in LruCache directly (same as suggested earlier by many reviewers).

This is what you can do:
 # In hbase-server, create new class AdaptiveLruBlockCache (package 
org.apache.hadoop.hbase.io.hfile) that would implement FirstLevelBlockCache, 
and keep it InterfaceAudience.Private (just like LruBlockCache).
 # Copy entire code from LruBlockCache to AdaptiveLruBlockCache (just update 
references of LruBlockCache with AdaptiveLruBlockCache)
 # Add Javadoc to AdaptiveLruBlockCache class (provide all details about perf 
improvement, how it is 3% faster)
 # BlockCacheFactory has method: createFirstLevelCache(), add one more option 
for adaptive blockCache and with some value (say "adaptiveLRU"). In that 
method, value "LRU" would initialize LruBlockCache, value "TinyLFU" would 
initialize TinyLfuBlockCache. Similarly, "adaptiveLRU" should initialize 
AdaptiveLruBlockCache.
 # Make all changes that you have done in current PR#1257 in LruBlockCache to 
AdaptiveLruBlockCache. And keep LruBlockCache unchanged.
 # Provide some documents in dev-support/design-docs, as Sean has mentioned 
above. You can also refer to those docs in Javadoc of AdaptiveLruBlockCache.

 

If you follow this, I don't think you will need to make any changes in 
CombinedBlockCache, InclusiveCombinedBlockCache, CacheConfig.

Let's get your changes in as a new Lru Blockcache at least. If you feel certain 
configurable changes should also be done in LruBlockCache class, we can 
consider it as part of separate Jira.

We do not wish to block your changes. However, since this is changing the way 
we cache (of course it is improved version), it better go as a configurable 
opt-in feature. With changes mentioned in step 4 above, user can choose this 
new FirstLevelBlockCache implementation (an improved LruBlockCache) by 
providing value "adaptiveLRU" to config "hfile.block.cache.policy", and that's 
it.

As an example, please take a look at how TinyLfuBlockCache is implemented and 
how it is instantiated (as configurable cache). It does not require any changes 
in CombinedBlockCache or InclusiveCombinedBlockCache because we are just 
providing a new L1 cache.

Let me know what you think. Thanks for working on this. I know it's been a lot 
of time, let's get your changes in.

> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Assignee: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, BlockCacheEvictionProcess.gif, cmp.png, 
> evict_BC100_vs_BC23.png, eviction_100p.png, eviction_100p.png, 
> eviction_100p.png, gc_100p.png, graph.png, image-2020-06-07-08-11-11-929.png, 
> image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, 
> image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, 
> image-2020-06-08-17-38-52-579.png, image-2020-06-08-18-35-48-366.png, 
> image-2020-06-14-20-51-11-905.png, image-2020-06-22-05-57-45-578.png, 
> image-2020-09-23-09-48-59-714.png, image-2020-09-23-10-06-11-189.png, 
> ratio.png, ratio2.png, read_requests_100pBC_vs_23pBC.png, requests_100p.png, 
> requests_100p.png, requests_new2_100p.png, requests_new_100p.png, scan.png, 
> scan_and_gets.png, scan_and_gets2.png, wave.png, ycsb_logs.zip
>
>
> Hi!
> I first time here, correct me please if something wrong.
> All latest information is here:
> [https://docs.google.com/document/d/1X8jVnK_3lp9ibpX6lnISf_He-6xrHZL0jQQ7hoTV0-g/edit?usp=sharing]
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC.
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> ---
> Some information below isn't  actual
> ---
>  
>  
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
>  124
>  198
>  223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
>  124 -> 24
>  198 -> 98
>  223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions).
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
>  The key point of the code:
>  Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>   
>  But if we set it 1-99, then will work the next logic:
>   
>   
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
>  hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
>  When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>   
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-23887) BlockCache performance improve by reduce eviction rate

Reply via email to