[ https://issues.apache.org/jira/browse/HDFS-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141366#comment-17141366 ]
Danil Lipovoy commented on HDFS-15409: -------------------------------------- I think you are absolutely right that the real blockId is completely unpredictable. On the other hand we don't need to predict it, right? As I understand we need to know about distribution. If it is evenly - all is ok. So, I did 2 tests: 1. Added information about the last digit into log: public ShortCircuitCache getShortCircuitCache(long idx) { LOG.info("Last digit: " + idx % 10); return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } Run read some HBase table: Collected distribution: cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n", $2, $1}'|sort 0 157128 1 171082 2 171019 3 171143 4 171421 5 170665 6 171525 7 167854 8 167641 9 157015 Difference between min-max slots less about 9%. 2. Added CRC32 hash: public ShortCircuitCache getShortCircuitCache(Long idx) { CRC32 crc = new CRC32(); crc.reset(); crc.update(idx.byteValue()); idx = crc.getValue(); LOG.info("Last crc digit: " + idx % 10); return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; } Run the same test and check the distribution: cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n", $2, $1}'|sort 0 140883 1 212124 2 152218 3 152024 4 141270 5 157903 6 182202 7 152417 8 209427 9 152268 Difference between min-max slots about 33%. Any ideas? > Optimization Strategy for choosing ShortCircuitCache > ----------------------------------------------------- > > Key: HDFS-15409 > URL: https://issues.apache.org/jira/browse/HDFS-15409 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Lisheng Sun > Priority: Major > > When clientShortCircuitNum is 10, the probability of falling into each > ShortCircuitCache is the same, while the probability of other > clientShortCircuitNum is different. > For example if clientShortCircuitNum is 3, when a lot of blockids of SSR are > ***1, ***4, ***7, this situation will fall into a ShortCircuitCache. > Since the real environment blockid is completely unpredictable, i think it is > need to design a strategy which is allocated to a specific ShortCircuitCache. > This should improve performance even more. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org