[ https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988728#comment-14988728 ]
Hadoop QA commented on HBASE-14664: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12770432/HBASE-14664.patch against master branch at commit 090fbd3ec862f8c85aa511172dd8e591b3b79332. ATTACHMENT ID: 12770432 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.token.TestGenerateDelegationToken {color:red}-1 core zombie tests{color}. There are possible 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16372//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16372//console This message is automatically generated. > Master failover issue: Backup master is unable to start if active master is > killed and started in short time interval > --------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-14664 > URL: https://issues.apache.org/jira/browse/HBASE-14664 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 2.0.0 > Reporter: Samir Ahmic > Assignee: Samir Ahmic > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-14664.patch, HBASE-14664.patch > > > I notice this issue while running IntegrationTestDDLMasterFailover, it can be > simply reproduced by executing this on active master (tested on two masters + > 3rs cluster setup) > {code} > $ kill -9 master_pid; hbase-daemon.sh start master > {code} > Logs show that new active master is trying to locate hbase:meta table on > restarted active master > {code} > 2015-10-21 19:28:20,804 INFO [hnode2:16000.activeMasterManager] > zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at > address=hnode1,16000,1445447051681, > exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is > not running yet > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330) > at > org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525) > at > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > 2015-10-21 19:28:20,805 INFO [hnode2:16000.activeMasterManager] > master.HMaster: Meta was in transition on hnode1,16000,1445447051681 > 2015-10-21 19:28:20,805 INFO [hnode2:16000.activeMasterManager] > master.AssignmentManager: Processing {1588230740 state=OPEN, > ts=1445448500598, server=hnode1,16000,1445447051681 > {code} > and because of above master is unable to read hbase:meta table: > {code} > 2015-10-21 19:28:49,429 INFO [hconnection-0x6e9cebcc-shared--pool6-t1] > client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last > exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not > running yet > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > {code} > which cause master is unable to complete start. > I have also notices that in this case value of /hbase/meta-region-server > znode is always pointing on restarted active master (hnode1 in my cluster ). > I was able to workaround this issue by repeating same scenario with following: > {code} > $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; > hbase-daemon.sh start master > {code} > So issue is probably caused by staled value in /hbase/meta-region-server > znode. I will try to create patch based on above. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)