[ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951558#comment-14951558 ]
Stephen Yuan Jiang commented on HBASE-14536: -------------------------------------------- Sorry, the attached patch is only for branch-1.1 - the MetaServerShutdownHandler.java code was refactored in branch-1. Rename the patch and re-submit. > Balancer & SSH interfering with each other leading to unavailability > -------------------------------------------------------------------- > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment > Affects Versions: 1.1.2 > Reporter: Devaraj Das > Assignee: Stephen Yuan Jiang > Fix For: 1.1.4 > > Attachments: master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 > server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > ...................... > 2015-09-29 06:18:26,899 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > ..................... > 2015-09-29 06:18:26,914 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > .................... > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] > master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > .................... > 2015-09-29 06:18:29,447 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)