[ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961820#comment-14961820 ]
Hudson commented on HBASE-14536: -------------------------------- FAILURE: Integrated in HBase-1.3 #277 (See [https://builds.apache.org/job/HBase-1.3/277/]) HBASE-14536 Balancer & SSH interfering with each other leading to (syuanjiangdev: rev 9bdb88a572ac30fb51fcc44284f51543d2b4568f) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestServerCrashProcedure.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java > Balancer & SSH interfering with each other leading to unavailability > -------------------------------------------------------------------- > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment > Affects Versions: 1.1.2 > Reporter: Devaraj Das > Assignee: Stephen Yuan Jiang > Fix For: 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14536.v1-branch-1.1.patch, > HBASE-14536.v1-branch-1.patch, HBASE-14536.v2-branch-1.1.patch, > HBASE-14536.v3-branch-1.1.patch, master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 > server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > ...................... > 2015-09-29 06:18:26,899 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > ..................... > 2015-09-29 06:18:26,914 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > .................... > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] > master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > .................... > 2015-09-29 06:18:29,447 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)