[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120565#comment-17120565
 ] 

Danil Lipovoy edited comment on HBASE-23887 at 5/31/20, 8:44 PM:
-----------------------------------------------------------------

Seems like our trouble with the servers for a long time and I've decided 
install HBase on my home PC.

Another important point - I have done the algorithm, that I posted above (will 
add changes to PR quite soon). It is good when numbers of reading requests are 
changing. Looks like the new approach copes well with wide variety kind of 
situation (a lot of tests in the next messages after answers).

1. I'm nor sure, but maybe it is because  first few seconds, while BlockCache 
is empty, my old version of realization prevented effective populating the BC. 
I mean it was skipping blocks when eviction is not running - and a lot of 
blocks could be cached but were lost. With the new approach the problems has 
gone. For example:

This is when 100% of data caching (uniform distribution):

[OVERALL], RunTime(ms), 1506417
 [OVERALL], Throughput(ops/sec), 33191.34077748724
 [TOTAL_GCS_PS_Scavenge], Count, 8388
 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 12146
 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.8062840501667201
 [TOTAL_GCS_PS_MarkSweep], Count, 1
 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 22
 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0014604189942094387
 [TOTAL_GCs], Count, 8389
 [TOTAL_GC_TIME], Time(ms), 12168
 [TOTAL_GC_TIME_%], Time(%), 0.8077444691609296
 [READ], Operations, 50000000
 [READ], AverageLatency(us), 1503.45024378
 [READ], MinLatency(us), 137
 [READ], MaxLatency(us), 383999
 [READ], 95thPercentileLatency(us), 2231
 [READ], 99thPercentileLatency(us), 13503
 [READ], Return=OK, 50000000

The same table with the patch:

[OVERALL], RunTime(ms), 1073257
 [OVERALL], Throughput(ops/sec), 46587.1641181935
 [TOTAL_GCS_PS_Scavenge], Count, 7201
 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 9799
 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.9130152423883563
 [TOTAL_GCS_PS_MarkSweep], Count, 1
 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 23
 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.002143009549436901
 [TOTAL_GCs], Count, 7202
 [TOTAL_GC_TIME], Time(ms), 9822
 [TOTAL_GC_TIME_%], Time(%), 0.9151582519377931
 [READ], Operations, 50000000
 [READ], AverageLatency(us), 1070.52889804
 [READ], MinLatency(us), 142
 [READ], MaxLatency(us), 327167
 [READ], 95thPercentileLatency(us), 2071
 [READ], 99thPercentileLatency(us), 6539
 [READ], Return=OK, 50000000

The same picture show all other test - you could see details below.

2.Looks like it could make negative effect if we try to use the feature if we 
set *hbase.lru.cache.heavy.eviction.count.limit*=0 and 
*hbase.lru.cache.heavy.eviction.mb.size.limit*=1 and doing sporadly short 
reading the same data. I meant when size BC=3 and we read block 1,2,3,4,3,4 ... 
4,3,2,1,2,1 ... 1,2,3,4,3,4... In this scenario better save all blocks. But 
this parameters will skip blocks which we will need quite soon. My opinion - it 
is extremely good for massive long-term reading on powerful servers. For short 
reading small amount of date too small values of the parameters could be 
pathological.

3. If I understand you correct - you meant that after compaction real blocks 
offset changed. But when HFiles compacted anyway all blocks removed from BC too.

4.Now we have two parameters for tuning:

*hbase.lru.cache.heavy.eviction.count.limit* - it controls how soon we want to 
see eviction rate reduce. If we know that our load pattern is only long term 
reading, we can set it 0. It means if we are reading - it is for a long time.  
But if we have some times short reading the same data and some times long-term 
reading - we have to divide it by this parameter. For example we know - our 
short reading used to about 1 min, we have to set the param about 10 and it 
will enable the feature only for long time massive reading.

*hbase.lru.cache.heavy.eviction.mb.size.limit* - it lets to control when we 
sure that GC will be suffer. For weak PC it could be about 50-100 MB. For 
powerful servers 300-500 MB.

I added some useful information into logging:

{color:#871094}LOG{color}.info({color:#067d17}"BlockCache evicted (MB): {}, 
overhead (%) {}, " {color}+
 {color:#067d17}"heavy eviction counter {}, " {color}+
 {color:#067d17}"current caching DataBlock (%): {}"{color},
 mbFreedSum, freedDataOverheadPercent,
 heavyEvictionCount, 
{color:#000000}cache{color}.{color:#871094}cacheDataBlockPercent{color});

It will help to understand what kind of values we have and how to tune it.

5. I think it is pretty good idea. Give me time, please, to do tests and check 
what will be.

Well, I will post information about the tests in the next message.

 


was (Author: pustota):
Seems like our trouble with the servers for a long time and I've decided 
install HBase on my home PC.

Another important point - I have done the algorithm, that I posted above (will 
add changes to PR quite soon). It is good when numbers of reading requests are 
changing. Looks like the new approach copes well with wide variety kind of 
situation (a lot of tests in the next messages after answers).

1. I'm nor sure, but maybe it is because  first few seconds, while BlockCache 
is empty, my old version of realization prevented effective populating the BC. 
I mean it was skipping blocks when eviction is not running - and a lot of 
blocks could be cached but were lost. With the new approach the problems has 
gone. For example:

This is when 100% of data caching (uniform distribution):

[OVERALL], RunTime(ms), 1506417
 [OVERALL], Throughput(ops/sec), 33191.34077748724
 [TOTAL_GCS_PS_Scavenge], Count, 8388
 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 12146
 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.8062840501667201
 [TOTAL_GCS_PS_MarkSweep], Count, 1
 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 22
 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0014604189942094387
 [TOTAL_GCs], Count, 8389
 [TOTAL_GC_TIME], Time(ms), 12168
 [TOTAL_GC_TIME_%], Time(%), 0.8077444691609296
 [READ], Operations, 50000000
 [READ], AverageLatency(us), 1503.45024378
 [READ], MinLatency(us), 137
 [READ], MaxLatency(us), 383999
 [READ], 95thPercentileLatency(us), 2231
 [READ], 99thPercentileLatency(us), 13503
 [READ], Return=OK, 50000000

The same table with the patch:

[OVERALL], RunTime(ms), 1073257
 [OVERALL], Throughput(ops/sec), 46587.1641181935
 [TOTAL_GCS_PS_Scavenge], Count, 7201
 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 9799
 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.9130152423883563
 [TOTAL_GCS_PS_MarkSweep], Count, 1
 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 23
 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.002143009549436901
 [TOTAL_GCs], Count, 7202
 [TOTAL_GC_TIME], Time(ms), 9822
 [TOTAL_GC_TIME_%], Time(%), 0.9151582519377931
 [READ], Operations, 50000000
 [READ], AverageLatency(us), 1070.52889804
 [READ], MinLatency(us), 142
 [READ], MaxLatency(us), 327167
 [READ], 95thPercentileLatency(us), 2071
 [READ], 99thPercentileLatency(us), 6539
 [READ], Return=OK, 50000000

The same picture all other test - you could see details below.

2.Looks like it could make negative effect if we try to use the feature if we 
set *hbase.lru.cache.heavy.eviction.count.limit*=0 and 
*hbase.lru.cache.heavy.eviction.mb.size.limit*=1 and doing sporadly short 
reading the same data. I meant when size BC=3 and we read block 1,2,3,4,3,4 ... 
4,3,2,1,2,1 ... 1,2,3,4,3,4... In this scenario better save all blocks. But 
this parameters will skip blocks which we will need quite soon. My opinion - it 
is extremely good for massive long-term reading on powerful servers. For short 
reading small amount of date too small values of the parameters could be 
pathological.

3. If I understand you correct - you meant that after compaction real blocks 
offset changed. But when HFiles compacted anyway all blocks removed from BC too.

4.Now we have two parameters for tuning:

*hbase.lru.cache.heavy.eviction.count.limit* - it controls how soon we want to 
see eviction rate reduce. If we know that our load pattern is only long term 
reading, we can set it 0. It means if we are reading - it is for a long time.  
But if we have some times short reading the same data and some times long-term 
reading - we have to divide it by this parameter. For example we know - our 
short reading used to about 1 min, we have to set the param about 10 and it 
will enable the feature only for long time massive reading.

*hbase.lru.cache.heavy.eviction.mb.size.limit* - it lets to control when we 
sure that GC will be suffer. For weak CPU it could be about 50-100 MB. For 
powerful servers 300-500 MB.

I added some useful information into logging:

{color:#871094}LOG{color}.info({color:#067d17}"BlockCache evicted (MB): {}, 
overhead (%) {}, " {color}+
 {color:#067d17}"heavy eviction counter {}, " {color}+
 {color:#067d17}"current caching DataBlock (%): {}"{color},
 mbFreedSum, freedDataOverheadPercent,
 heavyEvictionCount, 
{color:#000000}cache{color}.{color:#871094}cacheDataBlockPercent{color});

It will help to understand what kind of values we have and how to tune it.

4. I think it is pretty good idea. Give me time, please, to do tests and check 
what will be.

Well, I will post information about the tests in the next message.

 

> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, cmp.png, evict_BC100_vs_BC23.png, 
> eviction_100p.png, eviction_100p.png, eviction_100p.png, gc_100p.png, 
> read_requests_100pBC_vs_23pBC.png, requests_100p.png, requests_100p.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC. 
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
> 124
> 198
> 223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
> 124 -> 24
> 198 -> 98
> 223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions). 
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>  
> But if we set it 1-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
> hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
> When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>  
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to