[ https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902808#comment-16902808 ]
wenbang commented on HBASE-21464: --------------------------------- Hi,[~apurtell] Have you found the root cause? I am also confused about this issue. metaCache.clearCache has been executed. >From the log you can see that the meta location has been updated. {code:java} LOG.info("Call exception, tries=" + tries + ", retries=" + retries + ", started=" + (EnvironmentEdgeManager.currentTime() - this.globalStartTime) + " ms ago, " + "cancelled=" + cancelled.get() + ", msg=" + t.getMessage() + " " + callable.getExceptionMessageAdditionalDetail());{code} The callable.getExceptionMessageAdditionalDetail() has printed the correct meta location. But why are retries still the wrong location? > Splitting blocked with meta NSRE during split transaction > --------------------------------------------------------- > > Key: HBASE-21464 > URL: https://issues.apache.org/jira/browse/HBASE-21464 > Project: HBase > Issue Type: Bug > Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7 > Reporter: Andrew Purtell > Assignee: Andrew Purtell > Priority: Blocker > Fix For: 1.4.9 > > Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, > HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch > > > Splitting is blocked during split transaction. The split worker is trying to > update meta but isn't able to relocate it after NSRE: > {noformat} > 2018-11-09 17:50:45,277 INFO > [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434] > client.RpcRetryingCaller: Call exception, tries=13, retries=350, > started=88590 ms ago, cancelled=false, > msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 > is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row > 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table > 'hbase:meta' at region=hbase:meta,1.1588230740, > hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, > seqNum=0{noformat} > Clients, in this case YCSB, are hung with part of the keyspace missing: > {noformat} > 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] > client.ConnectionManager$HConnectionImplementation: locateRegionInMeta > parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying > after sleep of 20158 because: No server address listed in hbase:meta for > region > test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. > containing row user3301635648728421323{noformat} > Balancing cannot run indefinitely because the split transaction is stuck > {noformat} > 2018-11-09 17:49:55,478 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: > Not running balancer because 3 region(s) in transition: > [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, > {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute....{noformat} > -- This message was sent by Atlassian JIRA (v7.6.14#76016)