[ 
https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709075#comment-16709075
 ] 

Andrew Purtell commented on HBASE-21464:
----------------------------------------

{quote} * What if we have 1000's of clients? Would there now be 100's of META 
lookups per second?{quote}
 

This patch doesn't change that scenario. If we have 1000 newly initialized 
clients, like before they have no state, so have to go to zookeeper. No change.

If we have an initialized client with 1000s of threads trying to look up meta, 
only one wins the race on the lock, and the result is cached for 10 seconds, 
and it is very likely all threads waiting on the lock return with the newly 
cached value. This preserves the intent of the above mentioned jiras (the 
serialized lookup, the use of cache of meta location)

There are two changes: 
 # Don't use the region cache to cache the location of meta. Use a simple field 
instead.
 # At most hold on to the cached location of meta for 10 seconds.

I would like to revisit this whole area later.

[~lhofhansl]

 

 

> Splitting blocked with meta NSRE during split transaction
> ---------------------------------------------------------
>
>                 Key: HBASE-21464
>                 URL: https://issues.apache.org/jira/browse/HBASE-21464
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Blocker
>             Fix For: 1.5.0, 1.4.9
>
>         Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, 
> HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch
>
>
> Splitting is blocked during split transaction. The split worker is trying to 
> update meta but isn't able to relocate it after NSRE:
> {noformat}
> 2018-11-09 17:50:45,277 INFO  
> [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434]
>  client.RpcRetryingCaller: Call exception, tries=13, retries=350, 
> started=88590 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 
> is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832
>      at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row 
> 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table 
> 'hbase:meta' at region=hbase:meta,1.1588230740, 
> hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, 
> seqNum=0{noformat}
> Clients, in this case YCSB, are hung with part of the keyspace missing:
> {noformat}
> 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] 
> client.ConnectionManager$HConnectionImplementation: locateRegionInMeta 
> parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying 
> after sleep of 20158 because: No server address listed in hbase:meta for 
> region 
> test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. 
> containing row user3301635648728421323{noformat}
> Balancing cannot run indefinitely because the split transaction is stuck
> {noformat}
> 2018-11-09 17:49:55,478 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: 
> Not running balancer because 3 region(s) in transition: 
> [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, 
> {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute....{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to