[ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464196#comment-17464196
 ] 

Huaxiang Sun commented on HBASE-26590:
--------------------------------------

Thanks [~zhangduo].

For master, I think 10 is fine as all results are cached to the meta cache, so 
they are not wasted.

For hbase-2, the extra 4 results are not cached so a bit concern. The issue 
happened during the job restart, when ~700 hbase client starts at the same time 
with an empty meta cache, so there is a meta scan storm, there are ~300k 
regions in the meta table. I am not sure at this moment that this is the main 
factor as my testing result shows way less impact as the one observed by the 
production job. 

Some background info:

The cluster is stable without region move. 

There is meta replica Load Balance mode enabled at the 2.4.5 client side. Meta 
Replica Region Server is fully synced with the primary region as the cluster is 
stable. During my test, meta scan going through meta replica region does not 
cause performance regression. 

At my testing cluster, I can reproduce a bit regression with a RandomGet test 
with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression 
is gone (5 ~ 10%). 

I will submit this minor improvement patch and will work with the production 
team again to see if there is any improvement with the patch and the new 
BlockingRpcClient config. 

If the meta replica region is out of sync with the primary region, there will 
be lots of stale region locations, results in NotServingRegionException and 
client will do retry with the primary meta region. This will cause the serious 
latency issue, but this is not the case here. Anyway, I will keep an eye on it 
when we are going retry with the new 2.4.5 client.

 

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-26590
>                 URL: https://issues.apache.org/jira/browse/HBASE-26590
>             Project: HBase
>          Issue Type: Improvement
>          Components: meta
>    Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>            Reporter: Huaxiang Sun
>            Assignee: Huaxiang Sun
>            Priority: Major
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to