[ 
https://issues.apache.org/jira/browse/HDFS-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141366#comment-17141366
 ] 

Danil Lipovoy commented on HDFS-15409:
--------------------------------------

I think you are absolutely right that the real blockId is completely 
unpredictable. On the other hand we don't need to predict it, right? As I 
understand we need to know about distribution. If it is evenly - all is ok. 


So, I did 2 tests:

1. Added information about the last digit into log:

public ShortCircuitCache getShortCircuitCache(long idx) {
LOG.info("Last digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}

Run read some HBase table:

Collected distribution:
cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n", 
$2, $1}'|sort
0 157128
1 171082
2 171019
3 171143
4 171421
5 170665
6 171525
7 167854
8 167641
9 157015

Difference between min-max slots less about 9%.

2. Added CRC32 hash:

public ShortCircuitCache getShortCircuitCache(Long idx) {
CRC32 crc = new CRC32();
crc.reset();
crc.update(idx.byteValue());
idx = crc.getValue();
LOG.info("Last crc digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}

Run the same test and check the distribution:

cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf 
"%-8s%s\n", $2, $1}'|sort
0 140883
1 212124
2 152218
3 152024
4 141270
5 157903
6 182202
7 152417
8 209427
9 152268

Difference between min-max slots about 33%.

Any ideas?

>  Optimization Strategy for choosing ShortCircuitCache
> -----------------------------------------------------
>
>                 Key: HDFS-15409
>                 URL: https://issues.apache.org/jira/browse/HDFS-15409
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Lisheng Sun
>            Priority: Major
>
> When clientShortCircuitNum is 10, the probability of falling into each 
> ShortCircuitCache is the same, while the probability of other 
> clientShortCircuitNum is different.
> For example if clientShortCircuitNum is 3, when a lot of blockids of SSR are 
> ***1, ***4, ***7, this situation will fall into a ShortCircuitCache.
> Since the real environment blockid is completely unpredictable, i think it is 
> need to design a strategy which is allocated to a specific ShortCircuitCache. 
> This should improve performance even more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to