[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335498#comment-16335498
 ] 

Hudson commented on HBASE-19838:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4452 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4452/])
HBASE-19838 Can not shutdown backup master cleanly when it has already 
(zhangduo: rev 970636c5afbd1a12a998af3e8b0825f806bedeca)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestShutdownBackupMaster.java


> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch, HBASE-19838.master.001.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335274#comment-16335274
 ] 

Duo Zhang commented on HBASE-19838:
---

+1. Let me commit.

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch, HBASE-19838.master.001.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335261#comment-16335261
 ] 

Hadoop QA commented on HBASE-19838:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m  4s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}111m 
32s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19838 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907194/HBASE-19838.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 814124d4ebf8 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d49357f265 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11158/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11158/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: 

[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335083#comment-16335083
 ] 

stack commented on HBASE-19838:
---

Closing the shared clusterconnection on Master#shutdown seems to do the trick. 
This is how it looks w/ the nice test added here:

 
{code:java}

2018-01-22 14:41:07,171 INFO [M:0;localhost:52959] 
regionserver.HRegionServer(1152): M:0;localhost:52959 exiting
Exception in thread "M:0;localhost:52959" java.lang.IllegalStateException: 
Expected the service ClusterSchemaServiceImpl [FAILED] to be TERMINATED, but 
the service has FAILED
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:318)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:576)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x7d85155 
closed
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
at 
org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1117)
at 
org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:427)
at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:93)
at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1062)
at 
org.apache.hadoop.hbase.master.TestShutdownBackupMaster$MockHMaster.initClusterSchemaService(TestShutdownBackupMaster.java:67)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:924)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557)
... 1 more{code}
 

Studying use of clusterConnection in regionserver and master, calling close to 
kill any ongoing RPCs seems to be what we want.

Study of shutdown in 'normal case' doesn't seem to change and runs 'normally'.

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch, HBASE-19838.master.001.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333976#comment-16333976
 ] 

stack commented on HBASE-19838:
---

{quote}And for the rpc, I think we need to find a way to close the 
ConnectionImplementation from outside so that all rpc will fail soon.
{quote}
Agreed. I'd opened HBASE-19834 Will give it a go...

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333957#comment-16333957
 ] 

Duo Zhang commented on HBASE-19838:
---

{quote}
Nice test. What you thinking? Add this and change TestZooKeeper?
{quote}

Yes. Let's fix TestZooKeeper first. It is not designed to test this problem it 
does not fail stably so also not a good UT for debugging.

And for the rpc, I think we need to find a way to close the 
ConnectionImplementation from outside so that all rpc will fail soon.


> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333954#comment-16333954
 ] 

stack commented on HBASE-19838:
---

Nice test. What you thinking? Add this and change TestZooKeeper?

We need to do same for all places master startup does an rpc to a region or 
server.

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19838) Can not shutdown backup master cleanly when it has already tried to become the active master

2018-01-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333943#comment-16333943
 ] 

Duo Zhang commented on HBASE-19838:
---

[~stack] FYI. This UT fails stably for me.

Use a mocked HMaster implementation to hold in initClusterSchemaService and 
then we kill all the RSes and then let it go on. The UT will hang in 
HBaseTestingUtility.shutdownMiniCluster.

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> 
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)