[ https://issues.apache.org/jira/browse/HBASE-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917718#comment-13917718 ]
Enis Soztutar commented on HBASE-10632: --------------------------------------- bq. Will there be patches here or should we have backport issues? Attached patch applies to trunk, 0.98 and 0.96. Will commit it tomorrow. > Region lost in limbo after ArrayIndexOutOfBoundsException during assignment > --------------------------------------------------------------------------- > > Key: HBASE-10632 > URL: https://issues.apache.org/jira/browse/HBASE-10632 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: hbase-10070 > Reporter: Nick Dimiduk > Assignee: Enis Soztutar > Fix For: 0.96.2, 0.98.1, 0.99.0, hbase-10070 > > Attachments: hbase-10632_v1.patch > > > Discovered while running IntegrationTestBigLinkedList. Region > 24d68aa7239824e42390a77b7212fcbf is scheduled for move from hor13n19 to > hor13n13. During the process an exception is thrown. > {noformat} > 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] > master.RegionStates: Transitioning {24d68aa7239824e42390a77b7212fcbf > state=OPENING, ts=1393342207107, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552} will be handled by SSH > for hor13n19.gq1.ygridcore.net,60020,1393341563552 > 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] > handler.ServerShutdownHandler: Reassigning 7 region(s) that > hor13n19.gq1.ygridcore.net,60020,1393341563552 was carrying (and 0 regions(s) > that were opening on this server) > 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] > handler.ServerShutdownHandler: Reassigning region with rs = > {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552} and deleting zk node > if exists > 2014-02-25 15:30:42,623 INFO [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] > master.RegionStates: Transitioned {24d68aa7239824e42390a77b7212fcbf > state=OPENING, ts=1393342207107, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552} to > {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552} > 2014-02-25 15:30:42,623 DEBUG [AM.ZK.Worker-pool2-t46] > master.AssignmentManager: Znode > IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. > deleted, state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, > ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} > ... > 2014-02-25 15:30:43,993 ERROR [MASTER_SERVER_OPERATIONS-hor13n12:60000-4] > executor.EventHandler: Caught throwable while processing event > M_SERVER_SHUTDOWN > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:250) > at > org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:921) > at > org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.roundRobinAssignment(BaseLoadBalancer.java:860) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2482) > at > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:282) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > {noformat} > After that, region is left in limbo and is never reassigned. > {noformat} > 2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6] > master.HMaster: Client=hrt_qa//68.142.246.29 move > hri=IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf., > src=hor13n19.gq1.ygridcore.net,60020,1393341563552, > dest=hor13n13.gq1.ygridcore.net,60020,1393342222275, running balancer > 2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6] > master.AssignmentManager: Ignored moving region not assigned: {ENCODED => > 24d68aa7239824e42390a77b7212fcbf, NAME => > 'IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.', > STARTKEY => '\x80\x06\x1A', ENDKEY => ''}, {24d68aa7239824e42390a77b7212fcbf > state=OFFLINE, ts=1393342242623, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552} > ... > 2014-02-25 15:35:26,586 DEBUG > [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] > master.HMaster: Not running balancer because 1 region(s) in transition: > {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf > state=OFFLINE, ts=1393342242623, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552}} > ... > 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] > master.HMaster: Client=hrt_qa//68.142.246.29 unassign > IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. > in current location if it is online and reassign.force=false > 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] > master.AssignmentManager: Starting unassign of > IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. > (offlining), current state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, > ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} > 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] > master.AssignmentManager: Attempting to unassign > IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. > but it is already in transition (OFFLINE, force=false) > ... > 2014-02-25 15:40:26,587 DEBUG > [hor13n12.gq1.ygridcore.net,60000,1393341917402-BalancerChore] > master.HMaster: Not running balancer because 1 region(s) in transition: > {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf > state=OFFLINE, ts=1393342242623, > server=hor13n19.gq1.ygridcore.net,60020,1393341563552}} > {noformat} > Spoke with [~enis] about it earlier, assigning to him. -- This message was sent by Atlassian JIRA (v6.2#6252)