[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488302#comment-17488302 ] Huaxiang Sun commented on HBASE-26590: -- Sorry, I noticed that I committed it to the branch. In the fix version part, I did not put any 2.3 releases. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472372#comment-17472372 ] Nick Dimiduk commented on HBASE-26590: -- FYI branch-2.3 is EOL. We should not be committing changes to that branch. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470981#comment-17470981 ] Hudson commented on HBASE-26590: Results for branch branch-2.4 [build #270 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/270/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470832#comment-17470832 ] Hudson commented on HBASE-26590: Results for branch branch-2 [build #437 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/437/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470777#comment-17470777 ] Hudson commented on HBASE-26590: Results for branch branch-2.3 [build #322 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/322/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470697#comment-17470697 ] Hudson commented on HBASE-26590: Results for branch branch-2.5 [build #21 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/21/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 2.5.0, 2.4.10 > > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469506#comment-17469506 ] Huaxiang Sun commented on HBASE-26590: -- I modified my testing case, excluding connection setup/teardown from the time counted. Here is the result for 1m random meta lookup. I added option to use BlockingRpcClient for meta lookup against the default NettyRpcClient. ||h5. ~Version~ ||h5. ~Meta Replica Load Balance Enabled~||h5. ~BlockingRpcClient~ ||h5. ~Time(ms)~|| ||h5. ~2.4.5-with-fixed~||h5. ~No~||h5. ~No~||h5. ~370814~|| ||h5. ~2.4.5-with-fixed~||h5. ~No~||h5. ~Yes~||h5. ~358931~|| ||h5. ~2.4.5-with-fixed~||h5. ~Yes~||h5. ~Yes~||h5. ~349485~ || ||h5. ~2.4.5~||h5. ~No~||h5. ~No~||h5. ~516640~ || ||h5. ~2.4.5~||h5. ~Yes~||h5. ~Yes~||h5. ~497509~|| ||h5. ~cdh-5.16.2~||h5. ~No~||h5. ~No~||h5. ~371540~|| When I did the Table.get() test. It is hard to draw a solid conclusion due to key distribution, most of the keys randomly created fall into the the last region and it is cached. BlockingRpcClient/NettyRpcClient difference is about 3% (Not as initially reported as 5 ~ 10%), so not a very big concern here. This difference here is not big as what we observed at the production cluster. I am going to put up the patch and will work with the team to see if it helps. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464814#comment-17464814 ] Huaxiang Sun commented on HBASE-26590: -- I am modifying my test code to exclude the connection setup/teardown from the reported time (should not be there at the first place). Will report back when I have more testing results. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464275#comment-17464275 ] Duo Zhang commented on HBASE-26590: --- {quote} At my testing cluster, I can reproduce a bit regression with a RandomGet test with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression is gone (5 ~ 10%). {quote} Ah good, mind posting more detailed analysis here? NettyRpcClient is the suggested rpc implementation for 2.x, we should try our best to fix any performance issues. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464271#comment-17464271 ] Huaxiang Sun commented on HBASE-26590: -- By the way, the hbase-1 client app and hbase-2 client app are working against the same hbase-2.4.5 cluster, so the only difference is the the client module. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464196#comment-17464196 ] Huaxiang Sun commented on HBASE-26590: -- Thanks [~zhangduo]. For master, I think 10 is fine as all results are cached to the meta cache, so they are not wasted. For hbase-2, the extra 4 results are not cached so a bit concern. The issue happened during the job restart, when ~700 hbase client starts at the same time with an empty meta cache, so there is a meta scan storm, there are ~300k regions in the meta table. I am not sure at this moment that this is the main factor as my testing result shows way less impact as the one observed by the production job. Some background info: The cluster is stable without region move. There is meta replica Load Balance mode enabled at the 2.4.5 client side. Meta Replica Region Server is fully synced with the primary region as the cluster is stable. During my test, meta scan going through meta replica region does not cause performance regression. At my testing cluster, I can reproduce a bit regression with a RandomGet test with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression is gone (5 ~ 10%). I will submit this minor improvement patch and will work with the production team again to see if there is any improvement with the patch and the new BlockingRpcClient config. If the meta replica region is out of sync with the primary region, there will be lots of stale region locations, results in NotServingRegionException and client will do retry with the primary meta region. This will cause the serious latency issue, but this is not the case here. Anyway, I will keep an eye on it when we are going retry with the new 2.4.5 client. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462226#comment-17462226 ] Duo Zhang commented on HBASE-26590: --- {quote} Update, put the fix into a test which does real Table#get(), and still shows there is performance regression, so this is not the only cause. Debugging. {quote} I guess for a real workload the meta entris will soon be cached at client side so it will not effect the performance too much. Have you considered the balancer? Is it possible that for 2.4.x, the balancer will keep moving regions and cause lots of cache misses, such as the client side meta location cache, and also the block cache at server side. Thanks. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462225#comment-17462225 ] Duo Zhang commented on HBASE-26590: --- I think the intention for setting cache to 5 is we consider that fetching 5 rows is almost the same with fetching 1 row, as the rows in meta region is very small. And on master branch, the default prefetch limit is 10, but it is configurable. So if this is the case, for master branch we should change the default value to 1 instead of 10. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461122#comment-17461122 ] Huaxiang Sun commented on HBASE-26590: -- Update, put the fix into a test which does real Table#get(), and still shows there is performance regression, so this is not the only cause. Debugging. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461066#comment-17461066 ] Huaxiang Sun commented on HBASE-26590: -- It is hard to compare with the master branch, as it saves the fetched locations into client's meta cache. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461031#comment-17461031 ] Huaxiang Sun commented on HBASE-26590: -- I debugged the code, found that this regression is caused by the following line. It now asks region server to return 5 rows, which will take more time for region server to process. This change was introduced in HBASE-20182, which in most normal cases, the extra 4 rows returned are discarded. The proposed fix is to revert back to the hbase-1 behavior, i.e, ask for 1 row in meta scan. For the corner case fixed by HBASE-20182, it will go back to meta region server couple more times for the correct location. {code:java} diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java b/hbase- client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java index 9145c55c0a..6039387b6e 100644 --- a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java +++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java @@ -888,7 +888,7 @@ class ConnectionImplementation implements ClusterConnection, Closeable { byte[] metaStopKey = RegionInfo.createRegionName(tableName, HConstants.EMPTY_START_ROW, "", false); Scan s = new Scan().withStartRow(metaStartKey).withStopRow(metaStopKey, true) - .addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(5) + .addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(1) .setReadType(ReadType.PREAD); switch (this.metaReplicaMode) { {code} > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 3.0.0-alpha-1, 2.3.7 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
[ https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461032#comment-17461032 ] Huaxiang Sun commented on HBASE-26590: -- 2.4.5-with-fixed is the release with the proposed fix. With that, meta lookup time is similar to cdh-5.16.2's. > Hbase-client Meta lookup performance regression between hbase-1 and hbase-2 > --- > > Key: HBASE-26590 > URL: https://issues.apache.org/jira/browse/HBASE-26590 > Project: HBase > Issue Type: Improvement > Components: meta >Affects Versions: 3.0.0-alpha-1, 2.3.7 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > One of our users complained higher latency after application upgrades from > hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load > Balance mode during app restart. I reproduced the regression by a test for > meta lookup. > At my test cluster, there are 160k regions for the test table, so there are > 160k entries in meta region. Used one thread to do 1 million meta lookup > against the meta region server. > > ||Version ||Meta Replica Load Balance Enabled||Time || > ||2.4.5-with-fixed||Yes||336458ms|| > ||2.4.5-with-fixed||No||333253ms|| > ||2.4.5||Yes||469980ms|| > ||2.4.5||No||470515ms|| > | *cdh-5.16.2*| *No* | *323412ms*| > -- This message was sent by Atlassian Jira (v8.20.1#820001)