[ https://issues.apache.org/jira/browse/HBASE-23261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated HBASE-23261: --------------------------------- Attachment: HBASE-23261.branch-1.3.000.patch > Region stuck in transition while splitting > ------------------------------------------ > > Key: HBASE-23261 > URL: https://issues.apache.org/jira/browse/HBASE-23261 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.5 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1 > > Attachments: HBASE-23261.branch-1.3.000.patch, > HBASE-23261.branch-1.3.000.patch > > > While splitting, some region gets stuck in transition. After RegionServer > initiates split, ZK has the region marked in RIT ZNode. However, RegionServer > encounters KeeperException.BadVersion for > /hbase/region-in-transition/\{region-name} while transitioning node to > RS_ZK_REQUEST_REGION_SPLIT and hence, it runs rollback/cleanup of failed > split of the region. Even after successful rollback, region stays in > transition sometimes. > > > {code:java} > 2019-11-05 04:07:17,711 INFO [splits-1572926837064] regionserver.SplitRequest > - Successful rollback of failed split of > TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. > {code} > {code:java} > 2019-11-05 04:07:17,688 INFO [splits-1572926837064] regionserver.SplitRequest > - Running rollback/cleanup of failed split of > TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.; Failed getting > SPLITTING znode on TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. > java.io.IOException: Failed getting SPLITTING znode on > TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at > org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:203) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:383) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: > Failed transition of splitting node > TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at > org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:132) > at > org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:161) > ... 8 more Caused by: > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at > org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336) at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:442) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:818) at > org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:871) > at > org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:128) > ... 9 more > {code} > {code:java} > 2019-11-05 04:07:17,688 INFO [.Worker-pool3-t26826] master.RegionStates - > Transition {257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, > server=rsserver.net,60020,1572890688075} to {257ff8985e7a169af0514208b3b0b430 > state=SPLITTING, ts=1572926837688, server=rsserver.net,60020,1572890688075} > {code} > {code:java} > 2019-11-05 04:07:17,680 INFO [myid:5] [ead(sid:5 cport:-1):] > server.PrepRequestProcessor - Got user-level KeeperException when processing > sessionid:0x36dd5dc94536a3e type:setData cxid:0x8f8a zxid:0x304fd98ef > txntype:-1 reqpath:n/a Error > Path:/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 > Error:KeeperErrorCode = BadVersion for > /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 > {code} > {code:java} > 2019-11-05 04:07:17,668 DEBUG [.Worker-pool3-t26826] master.AssignmentManager > - Handling RS_ZK_REQUEST_REGION_SPLIT, > server=rsserver.net,60020,1572890688075, > region=257ff8985e7a169af0514208b3b0b430, > current_state={257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, > server=rsserver.net,60020,1572890688075} > {code} > {code:java} > 2019-11-05 04:07:17,661 DEBUG [splits-1572926837064] > coordination.ZKSplitTransactionCoordination - Still waiting for master to > process the pending_split for 257ff8985e7a169af0514208b3b0b430 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)