[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-02-07 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488302#comment-17488302
 ] 

Huaxiang Sun commented on HBASE-26590:
--

Sorry, I noticed that I committed it to the branch. In the fix version part, I 
did not put any 2.3 releases.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-10 Thread Nick Dimiduk (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472372#comment-17472372
 ] 

Nick Dimiduk commented on HBASE-26590:
--

FYI branch-2.3 is EOL. We should not be committing changes to that branch.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470981#comment-17470981
 ] 

Hudson commented on HBASE-26590:


Results for branch branch-2.4
[build #270 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470832#comment-17470832
 ] 

Hudson commented on HBASE-26590:


Results for branch branch-2
[build #437 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470777#comment-17470777
 ] 

Hudson commented on HBASE-26590:


Results for branch branch-2.3
[build #322 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470697#comment-17470697
 ] 

Hudson commented on HBASE-26590:


Results for branch branch-2.5
[build #21 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-05 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469506#comment-17469506
 ] 

Huaxiang Sun commented on HBASE-26590:
--

I modified my testing case, excluding connection setup/teardown from the time 
counted. Here is the result for 1m random meta lookup. I added option to use 
BlockingRpcClient for meta lookup against the default NettyRpcClient.
||h5. ~Version~ ||h5. ~Meta Replica Load Balance Enabled~||h5. 
~BlockingRpcClient~               ||h5. ~Time(ms)~||
||h5. ~2.4.5-with-fixed~||h5. ~No~||h5. ~No~||h5. ~370814~||
||h5. ~2.4.5-with-fixed~||h5. ~No~||h5. ~Yes~||h5. ~358931~||
||h5. ~2.4.5-with-fixed~||h5. ~Yes~||h5. ~Yes~||h5. ~349485~ ||
||h5. ~2.4.5~||h5. ~No~||h5. ~No~||h5. ~516640~ ||
||h5. ~2.4.5~||h5. ~Yes~||h5. ~Yes~||h5. ~497509~||
||h5.       ~cdh-5.16.2~||h5. ~No~||h5. ~No~||h5. ~371540~||

 

When I did the Table.get() test. It is hard to draw a solid conclusion due to 
key distribution, most of the keys randomly created fall into the the last 
region and it is cached. BlockingRpcClient/NettyRpcClient difference is about 
3% (Not as initially reported as 5 ~ 10%), so not a very big concern here.

This difference here is not big as what we observed at the production cluster. 
I am going to put up the patch and will work with the team to see if it helps.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-23 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464814#comment-17464814
 ] 

Huaxiang Sun commented on HBASE-26590:
--

I am modifying my test code to exclude the connection setup/teardown from the 
reported time (should not be there at the first place). Will report back when I 
have more testing results.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-22 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464275#comment-17464275
 ] 

Duo Zhang commented on HBASE-26590:
---

{quote}
At my testing cluster, I can reproduce a bit regression with a RandomGet test 
with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression 
is gone (5 ~ 10%). 
{quote}

Ah good, mind posting more detailed analysis here? NettyRpcClient is the 
suggested rpc implementation for 2.x, we should try our best to fix any 
performance issues.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-22 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464271#comment-17464271
 ] 

Huaxiang Sun commented on HBASE-26590:
--

By the way, the hbase-1 client app and hbase-2 client app are working against 
the same hbase-2.4.5 cluster, so the only difference is the the client module.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-22 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464196#comment-17464196
 ] 

Huaxiang Sun commented on HBASE-26590:
--

Thanks [~zhangduo].

For master, I think 10 is fine as all results are cached to the meta cache, so 
they are not wasted.

For hbase-2, the extra 4 results are not cached so a bit concern. The issue 
happened during the job restart, when ~700 hbase client starts at the same time 
with an empty meta cache, so there is a meta scan storm, there are ~300k 
regions in the meta table. I am not sure at this moment that this is the main 
factor as my testing result shows way less impact as the one observed by the 
production job. 

Some background info:

The cluster is stable without region move. 

There is meta replica Load Balance mode enabled at the 2.4.5 client side. Meta 
Replica Region Server is fully synced with the primary region as the cluster is 
stable. During my test, meta scan going through meta replica region does not 
cause performance regression. 

At my testing cluster, I can reproduce a bit regression with a RandomGet test 
with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression 
is gone (5 ~ 10%). 

I will submit this minor improvement patch and will work with the production 
team again to see if there is any improvement with the patch and the new 
BlockingRpcClient config. 

If the meta replica region is out of sync with the primary region, there will 
be lots of stale region locations, results in NotServingRegionException and 
client will do retry with the primary meta region. This will cause the serious 
latency issue, but this is not the case here. Anyway, I will keep an eye on it 
when we are going retry with the new 2.4.5 client.

 

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-19 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462226#comment-17462226
 ] 

Duo Zhang commented on HBASE-26590:
---

{quote}
Update, put the fix into a test which does real Table#get(), and still shows 
there is performance regression, so this is not the only cause. Debugging.
{quote}

I guess for a real workload the meta entris will soon be cached at client side 
so it will not effect the performance too much.

Have you considered the balancer? Is it possible that for 2.4.x, the balancer 
will keep moving regions and cause lots of cache misses, such as the client 
side meta location cache, and also the block cache at server side.

Thanks.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-19 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462225#comment-17462225
 ] 

Duo Zhang commented on HBASE-26590:
---

I think the intention for setting cache to 5 is we consider that fetching 5 
rows is almost the same with fetching 1 row, as the rows in meta region is very 
small.

And on master branch, the default prefetch limit is 10, but it is configurable.

So if this is the case, for master branch we should change the default value to 
1 instead of 10.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-16 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461122#comment-17461122
 ] 

Huaxiang Sun commented on HBASE-26590:
--

Update, put the fix into a test which does real Table#get(), and still shows 
there is performance regression, so this is not the only cause. Debugging.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-16 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461066#comment-17461066
 ] 

Huaxiang Sun commented on HBASE-26590:
--

It is hard to compare with the master branch, as it saves the fetched locations 
into client's meta cache.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-16 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461031#comment-17461031
 ] 

Huaxiang Sun commented on HBASE-26590:
--

I debugged the code, found that this regression is caused by the following 
line. It now asks region server to return 5 rows, which will take more time for 
region server to process. This change was introduced in HBASE-20182, which in 
most normal cases, the extra 4 rows returned are discarded. The proposed fix is 
to revert back to the hbase-1 behavior, i.e, ask for 1 row in meta scan. For 
the corner case fixed by HBASE-20182, it will go back to meta region server 
couple more times for the correct location.

 
{code:java}
diff --git 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
 b/hbase-
client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
index 9145c55c0a..6039387b6e 100644
--- 
a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
+++ 
b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
@@ -888,7 +888,7 @@ class ConnectionImplementation implements 
ClusterConnection, Closeable {
     byte[] metaStopKey =
       RegionInfo.createRegionName(tableName, HConstants.EMPTY_START_ROW, "", 
false);
     Scan s = new Scan().withStartRow(metaStartKey).withStopRow(metaStopKey, 
true)
-      .addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(5)
+      .addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(1)
       .setReadType(ReadType.PREAD);
 
     switch (this.metaReplicaMode) { {code}
 

 

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 3.0.0-alpha-1, 2.3.7
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-16 Thread Huaxiang Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461032#comment-17461032
 ] 

Huaxiang Sun commented on HBASE-26590:
--

2.4.5-with-fixed is the release with the proposed fix. With that, meta lookup 
time is similar to cdh-5.16.2's.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 3.0.0-alpha-1, 2.3.7
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)