[ 
https://issues.apache.org/jira/browse/HDFS-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141366#comment-17141366
 ] 

Danil Lipovoy edited comment on HDFS-15409 at 6/21/20, 9:38 AM:
----------------------------------------------------------------

I think you are absolutely right that the real blockId is completely 
unpredictable. On the other hand we don't need to predict it, right? As I 
understand we need to know about distribution. If it is evenly - all is ok.

So, I did 2 tests:

1. Added information about the last digit into log:

 
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx)
{ 
  LOG.info("Last digit: " + idx % 10); 
  return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; 
}{code}
 

Run read some HBase table:

Collected distribution:
 cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n", 
$2, $1}'|sort
 0 157128
 1 171082
 2 171019
 3 171143
 4 171421
 5 170665
 6 171525
 7 167854
 8 167641
 9 157015

Difference between min-max slots  ~9%.

2. Added CRC32 hash:

 
{code:java}
public ShortCircuitCache getShortCircuitCache(Long idx)
{ 
  CRC32 crc = new CRC32(); 
  crc.reset(); 
  crc.update(idx.byteValue()); 
  idx = crc.getValue(); 
  LOG.info("Last crc digit: " + idx % 10); 
  return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; 
}{code}
 

Run the same test and check the distribution:

cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf 
"%-8s%s\n", $2, $1}'|sort
 0 140883
 1 212124
 2 152218
 3 152024
 4 141270
 5 157903
 6 182202
 7 152417
 8 209427
 9 152268

Difference between min-max slots about 33%.

Any ideas?


was (Author: pustota):
I think you are absolutely right that the real blockId is completely 
unpredictable. On the other hand we don't need to predict it, right? As I 
understand we need to know about distribution. If it is evenly - all is ok.

So, I did 2 tests:

1. Added information about the last digit into log:

 
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx)
{ 
  LOG.info("Last digit: " + idx % 10); 
  return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; 
}{code}
 

Run read some HBase table:

Collected distribution:
 cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n", 
$2, $1}'|sort
 0 157128
 1 171082
 2 171019
 3 171143
 4 171421
 5 170665
 6 171525
 7 167854
 8 167641
 9 157015

Difference between min-max slots less about 9%.

2. Added CRC32 hash:

 
{code:java}
public ShortCircuitCache getShortCircuitCache(Long idx)
{ 
  CRC32 crc = new CRC32(); 
  crc.reset(); 
  crc.update(idx.byteValue()); 
  idx = crc.getValue(); 
  LOG.info("Last crc digit: " + idx % 10); 
  return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; 
}{code}
 

Run the same test and check the distribution:

cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last 
crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf 
"%-8s%s\n", $2, $1}'|sort
 0 140883
 1 212124
 2 152218
 3 152024
 4 141270
 5 157903
 6 182202
 7 152417
 8 209427
 9 152268

Difference between min-max slots about 33%.

Any ideas?

>  Optimization Strategy for choosing ShortCircuitCache
> -----------------------------------------------------
>
>                 Key: HDFS-15409
>                 URL: https://issues.apache.org/jira/browse/HDFS-15409
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Lisheng Sun
>            Priority: Major
>
> When clientShortCircuitNum is 10, the probability of falling into each 
> ShortCircuitCache is the same, while the probability of other 
> clientShortCircuitNum is different.
> For example if clientShortCircuitNum is 3, when a lot of blockids of SSR are 
> ***1, ***4, ***7, this situation will fall into a ShortCircuitCache.
> Since the real environment blockid is completely unpredictable, i think it is 
> need to design a strategy which is allocated to a specific ShortCircuitCache. 
> This should improve performance even more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to