[ 
https://issues.apache.org/jira/browse/HBASE-23261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HBASE-23261:
---------------------------------
    Description: 
While splitting, some region gets stuck in transition. After RegionServer 
initiates split, ZK has the region marked in RIT ZNode. However, HMaster has 
KeeperException with BadVersion for /hbase/region-in-transition/\{region-name} 
and hence, it runs rollback/cleanup of failed split of the region. Even after 
successful rollback, region is in transition.

 

 
{code:java}
2019-11-05 04:07:17,711 INFO [splits-1572926837064] regionserver.SplitRequest - 
Successful rollback of failed split of 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
{code}
{code:java}
2019-11-05 04:07:17,688 INFO [splits-1572926837064] regionserver.SplitRequest - 
Running rollback/cleanup of failed split of 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.; Failed getting 
SPLITTING znode on TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
java.io.IOException: Failed getting SPLITTING znode on 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:203)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:383)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
 at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
 at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Failed 
transition of splitting node 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:132)
 at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:161)
 ... 8 more Caused by: 
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at 
org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336) at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:442)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:818) at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:871) at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:128)
 ... 9 more
{code}
{code:java}
2019-11-05 04:07:17,688 INFO [.Worker-pool3-t26826] master.RegionStates - 
Transition {257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
server=rsserver.net,60020,1572890688075} to {257ff8985e7a169af0514208b3b0b430 
state=SPLITTING, ts=1572926837688, server=rsserver.net,60020,1572890688075}
{code}
{code:java}
2019-11-05 04:07:17,680 INFO [myid:5] [ead(sid:5 cport:-1):] 
server.PrepRequestProcessor - Got user-level KeeperException when processing 
sessionid:0x36dd5dc94536a3e type:setData cxid:0x8f8a zxid:0x304fd98ef 
txntype:-1 reqpath:n/a Error 
Path:/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 
Error:KeeperErrorCode = BadVersion for 
/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430
{code}
{code:java}
2019-11-05 04:07:17,668 DEBUG [.Worker-pool3-t26826] master.AssignmentManager - 
Handling RS_ZK_REQUEST_REGION_SPLIT, server=rsserver.net,60020,1572890688075, 
region=257ff8985e7a169af0514208b3b0b430, 
current_state={257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
server=rsserver.net,60020,1572890688075}
{code}
{code:java}
2019-11-05 04:07:17,661 DEBUG [splits-1572926837064] 
coordination.ZKSplitTransactionCoordination - Still waiting for master to 
process the pending_split for 257ff8985e7a169af0514208b3b0b430
{code}
 

 

  was:
While splitting, some region gets stuck in transition. After RegionServer 
initiates split, ZK has the region marked in RIT ZNode. However, HMaster has 
KeeperException with BadVersion for /hbase/region-in-transition/\{region-name} 
and hence, it runs rollback/cleanup of failed split of the region. Even after 
successful rollback, region is in transition.

 

 
{code:java}
2019-11-05 04:07:17,711 INFO [splits-1572926837064] regionserver.SplitRequest - 
Successful rollback of failed split of 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
{code}
{code:java}
2019-11-05 04:07:17,688 INFO [splits-1572926837064] regionserver.SplitRequest - 
Running rollback/cleanup of failed split of 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.; Failed getting 
SPLITTING znode on TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
{code}
{code:java}
 
java.io.IOException: Failed getting SPLITTING znode on 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:203)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:383)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
 at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
 at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
 at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Failed 
transition of splitting node 
TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:132)
 at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:161)
 ... 8 more Caused by: 
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
BadVersion for /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at 
org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336) at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:442)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:818) at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:871) at 
org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:128)
 ... 9 more
{code}
{code:java}
2019-11-05 04:07:17,688 INFO [.Worker-pool3-t26826] master.RegionStates - 
Transition {257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
server=rsserver.net,60020,1572890688075} to {257ff8985e7a169af0514208b3b0b430 
state=SPLITTING, ts=1572926837688, server=rsserver.net,60020,1572890688075}
{code}
{code:java}
2019-11-05 04:07:17,680 INFO [myid:5] [ead(sid:5 cport:-1):] 
server.PrepRequestProcessor - Got user-level KeeperException when processing 
sessionid:0x36dd5dc94536a3e type:setData cxid:0x8f8a zxid:0x304fd98ef 
txntype:-1 reqpath:n/a Error 
Path:/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 
Error:KeeperErrorCode = BadVersion for 
/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430
{code}
{code:java}
2019-11-05 04:07:17,668 DEBUG [.Worker-pool3-t26826] master.AssignmentManager - 
Handling RS_ZK_REQUEST_REGION_SPLIT, server=rsserver.net,60020,1572890688075, 
region=257ff8985e7a169af0514208b3b0b430, 
current_state={257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
server=rsserver.net,60020,1572890688075}
{code}
{code:java}
2019-11-05 04:07:17,661 DEBUG [splits-1572926837064] 
coordination.ZKSplitTransactionCoordination - Still waiting for master to 
process the pending_split for 257ff8985e7a169af0514208b3b0b430
{code}
 

 


> Region stuck in transition while splitting
> ------------------------------------------
>
>                 Key: HBASE-23261
>                 URL: https://issues.apache.org/jira/browse/HBASE-23261
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.3.5
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>
> While splitting, some region gets stuck in transition. After RegionServer 
> initiates split, ZK has the region marked in RIT ZNode. However, HMaster has 
> KeeperException with BadVersion for 
> /hbase/region-in-transition/\{region-name} and hence, it runs 
> rollback/cleanup of failed split of the region. Even after successful 
> rollback, region is in transition.
>  
>  
> {code:java}
> 2019-11-05 04:07:17,711 INFO [splits-1572926837064] regionserver.SplitRequest 
> - Successful rollback of failed split of 
> TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
> {code}
> {code:java}
> 2019-11-05 04:07:17,688 INFO [splits-1572926837064] regionserver.SplitRequest 
> - Running rollback/cleanup of failed split of 
> TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.; Failed getting 
> SPLITTING znode on TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430.
> java.io.IOException: Failed getting SPLITTING znode on 
> TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
> org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:203)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:383)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:153) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: 
> Failed transition of splitting node 
> TABLE1,1572894157455.257ff8985e7a169af0514208b3b0b430. at 
> org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:132)
>  at 
> org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.waitForSplitTransaction(ZKSplitTransactionCoordination.java:161)
>  ... 8 more Caused by: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at 
> org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1336) at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:442)
>  at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:818) at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:871) 
> at 
> org.apache.hadoop.hbase.coordination.ZKSplitTransactionCoordination.transitionSplittingNode(ZKSplitTransactionCoordination.java:128)
>  ... 9 more
> {code}
> {code:java}
> 2019-11-05 04:07:17,688 INFO [.Worker-pool3-t26826] master.RegionStates - 
> Transition {257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
> server=rsserver.net,60020,1572890688075} to {257ff8985e7a169af0514208b3b0b430 
> state=SPLITTING, ts=1572926837688, server=rsserver.net,60020,1572890688075}
> {code}
> {code:java}
> 2019-11-05 04:07:17,680 INFO [myid:5] [ead(sid:5 cport:-1):] 
> server.PrepRequestProcessor - Got user-level KeeperException when processing 
> sessionid:0x36dd5dc94536a3e type:setData cxid:0x8f8a zxid:0x304fd98ef 
> txntype:-1 reqpath:n/a Error 
> Path:/hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430 
> Error:KeeperErrorCode = BadVersion for 
> /hbase/region-in-transition/257ff8985e7a169af0514208b3b0b430
> {code}
> {code:java}
> 2019-11-05 04:07:17,668 DEBUG [.Worker-pool3-t26826] master.AssignmentManager 
> - Handling RS_ZK_REQUEST_REGION_SPLIT, 
> server=rsserver.net,60020,1572890688075, 
> region=257ff8985e7a169af0514208b3b0b430, 
> current_state={257ff8985e7a169af0514208b3b0b430 state=OPEN, ts=1572923178845, 
> server=rsserver.net,60020,1572890688075}
> {code}
> {code:java}
> 2019-11-05 04:07:17,661 DEBUG [splits-1572926837064] 
> coordination.ZKSplitTransactionCoordination - Still waiting for master to 
> process the pending_split for 257ff8985e7a169af0514208b3b0b430
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to