[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-11-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031344#comment-15031344
 ] 

stack commented on HBASE-14664:
---

Nice [~asamir] Thanks for figuring root cause.

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-11-26 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028644#comment-15028644
 ] 

Samir Ahmic commented on HBASE-14664:
-

After proper cleaning is done in 
-[HBASE-14861|https://issues.apache.org/jira/browse/HBASE-14861]- looks like 
this issue is gone, I will run few more tests on cluster to confirm and close 
this issue as fixed.

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-11-05 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991456#comment-14991456
 ] 

Samir Ahmic commented on HBASE-14664:
-

Thanks for review [~stack],
regarding unit test what do you have on mind? 
ActiveMasterManager#handleMasterNodeChange() is covered in 
TestActiveMasterManager and is also tested in TestMasterFailover as part of 
failover process. 
Should we create test covering this scenario killing and restarting master to 
reproduce issue or focus on aftermath of removing meta-region-znode ?  


> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988728#comment-14988728
 ] 

Hadoop QA commented on HBASE-14664:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12770432/HBASE-14664.patch
  against master branch at commit 090fbd3ec862f8c85aa511172dd8e591b3b79332.
  ATTACHMENT ID: 12770432

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.token.TestGenerateDelegationToken

  {color:red}-1 core zombie tests{color}.  There are possible 1 zombie 
test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16372//console

This message is automatically generated.

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> 

[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-10-29 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980322#comment-14980322
 ] 

Samir Ahmic commented on HBASE-14664:
-

This test is unrelated with changes introduced with this patch. I run 
TestRegionMover few times on on two different machines every time test passed.  

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-10-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979411#comment-14979411
 ] 

Hadoop QA commented on HBASE-14664:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769309/HBASE-14664.patch
  against master branch at commit 4b018d2a3988a70d98e2388d0013e63857c5e193.
  ATTACHMENT ID: 12769309

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestRegionMover

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16268//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16268//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16268//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16268//console

This message is automatically generated.

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read