[ https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-14207: -------------------------- Status: Patch Available (was: Open) > Region was hijacked and remained in transition when RS failed to open a > region and later regionplan changed to new RS on retry > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-14207 > URL: https://issues.apache.org/jira/browse/HBASE-14207 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.98.6 > Reporter: Pankaj Kumar > Assignee: Pankaj Kumar > Priority: Critical > Fix For: 0.98.15 > > Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98.patch > > > On production environment, following events happened > 1. Master is trying to assign a region to RS, but due to > KeeperException$SessionExpiredException RS failed to open the region. > In RS log, saw multiple WARN log related to > KeeperException$SessionExpiredException > > KeeperErrorCode = Session expired for > /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b > > Unable to get data of znode > /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b > 2. Master retried to assign the region to same RS, but RS again failed. > 3. On second retry new plan formed and this time plan destination (RS) is > different, so master send the request to new RS to open the region. But new > RS failed to open the region as there was server mismatch in ZNODE than the > expected current server name. > Logs Snippet: > {noformat} > HM > 2015-07-14 03:50:29,759 | INFO | master:T101PC03VM13:21300 | Processing > 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE | > org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644) > 2015-07-14 03:50:29,759 | INFO | master:T101PC03VM13:21300 | Transitioned > {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679, > server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, > ts=1436817029759, server=T101PC03VM13,21302,1436816690692} | > org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327) > 2015-07-14 03:50:29,760 | INFO | master:T101PC03VM13:21300 | Processed > region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on > server: T101PC03VM13,21302,1436816690692 | > org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768) > 2015-07-14 03:50:29,800 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > T101PC03VM13,21302,1436816690692 | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983) > 2015-07-14 03:50:29,801 | WARN | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1 > of 10 | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077) > 2015-07-14 03:50:29,802 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > the same failed server. | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123) > 2015-07-14 03:50:31,804 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > T101PC03VM13,21302,1436816690692 | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983) > 2015-07-14 03:50:31,806 | WARN | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2 > of 10 | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077) > 2015-07-14 03:50:31,807 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned > {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804, > server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b > state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} | > org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327) > 2015-07-14 03:50:31,807 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to > T101PC03VM14,21302,1436816997967 | > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983) > 2015-07-14 03:50:31,807 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned > {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817031807, > server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b > state=PENDING_OPEN, ts=1436817031807, > server=T101PC03VM14,21302,1436816997967} | > org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327) > 2015-07-14 03:51:09,501 | INFO | > MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-4 | Skip assigning region in > transition on other server{08f1935d652e5dbdac09b423b8f9401b > state=PENDING_OPEN, ts=1436817031807, > server=T101PC03VM14,21302,1436816997967} | > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:250) > {noformat} > {noformat} > RS - T101PC03VM14 > 2015-07-14 03:50:31,809 | INFO | > PriorityRpcServer.handler=2,queue=0,port=21302 | Open > INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. | > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:3671) > 2015-07-14 03:50:31,830 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 | > regionserver:21302-0xe4e88f6f1b70002, > quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002, > baseZNode=/hbase Attempt to transition the unassigned node for > 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE to > RS_ZK_REGION_OPENING failed, the server that tried to transition was > T101PC03VM14,21302,1436816997967 not the expected > T101PC03VM13,21302,1436816690692 | > org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875) > 2015-07-14 03:50:31,830 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 | > Failed transition from OFFLINE to OPENING for > region=08f1935d652e5dbdac09b423b8f9401b | > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:539) > 2015-07-14 03:50:31,831 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 | > Region was hijacked? Opening cancelled for > encodedName=08f1935d652e5dbdac09b423b8f9401b | > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:132) > 2015-07-14 03:50:31,831 | INFO | RS_OPEN_REGION-T101PC03VM14:21302-2 | > Opening of region {ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME => > 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.', > STARTKEY => '', ENDKEY => '200'} failed, transitioning from OFFLINE to > FAILED_OPEN in ZK, expecting version -1 | > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:436) > 2015-07-14 03:50:31,834 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 | > regionserver:21302-0xe4e88f6f1b70002, > quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002, > baseZNode=/hbase Attempt to transition the unassigned node for > 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE to > RS_ZK_REGION_FAILED_OPEN failed, the server that tried to transition was > T101PC03VM14,21302,1436816997967 not the expected > T101PC03VM13,21302,1436816690692 | > org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875) > 2015-07-14 03:50:31,834 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 | > Unable to mark region {ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME => > 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.', > STARTKEY => '', ENDKEY => '200'} as FAILED_OPEN. It's likely that the master > already timed out this open attempt, and thus another RS already has the > region. | > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:444) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)