[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

Hadoop QA (JIRA) Tue, 03 Nov 2015 18:07:15 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988728#comment-14988728
 ]


Hadoop QA commented on HBASE-14664:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12770432/HBASE-14664.patch
  against master branch at commit 090fbd3ec862f8c85aa511172dd8e591b3b79332.
  ATTACHMENT ID: 12770432

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       
org.apache.hadoop.hbase.security.token.TestGenerateDelegationToken

      {color:red}-1 core zombie tests{color}.  There are possible 1 zombie 
test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//console

This message is automatically generated.

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14664
>                 URL: https://issues.apache.org/jira/browse/HBASE-14664
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.0.0
>            Reporter: Samir Ahmic
>            Assignee: Samir Ahmic
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
>         at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

Reply via email to