[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015799#comment-16015799
 ] 

Hadoop QA commented on HBASE-18058:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 27s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 104m 21s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
4s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 154m 3s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868737/HBASE-18058.v2.patch |
| JIRA Issue | HBASE-18058 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  xml  |
| uname | Linux 54d3d24ce69e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6dc4190c |
| Default Java | 

[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015618#comment-16015618
 ] 

Ted Yu commented on HBASE-18058:


+1 on v2.

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch, HBASE-18058.v2.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.
> A case of damage done by high sleep time:
> If the server hosting zookeeper is disk full, the zookeeper quorum won't 
> really went down but reject all write request. So at HBase side, new zk write 
> request will suffers from exception and retry. But connection remains so the 
> session won't timeout. When disk full situation have been resolved, the 
> zookeeper quorum can work normally again. But the very high sleep time cause 
> some module of RegionServer/HMaster will still sleep for a long time(for 
> example, the balancer) before working.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-17 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015106#comment-16015106
 ] 

Allan Yang commented on HBASE-18058:


Sure, I will add zookeeper.recovery.retry.maxsleeptime to hbase-defaults.xml 
and add some description,  Thanks, [~apurtell] and [~carp84].

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-17 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014953#comment-16014953
 ] 

Andrew Purtell commented on HBASE-18058:


Mostly lgtm but please add zookeeper.recovery.retry.maxsleeptime to 
hbase-defaults.xml and/or update the book to document the setting's 
availability and default. 


> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-17 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013607#comment-16013607
 ] 

Yu Li commented on HBASE-18058:
---

I see, interesting case, thanks for sharing. Maybe briefly talking about the 
story in JIRA description is a good idea? Thanks.

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-16 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013479#comment-16013479
 ] 

Allan Yang commented on HBASE-18058:


{quote}
Normally in this case RegionServer will crash due to zookeeper session timeout, 
similar like when RS full GC, right? Mind share the case in your scenario? How 
do you keep RS alive while zookeeper down for some while? Thanks. Allan Yang
{quote}
Yes, It is a very interesting case and really happened. If the server hosting 
zookeeper is disk full, the zookeeper quorum won't really went down but reject 
all connection and request. So at HBase side, it will suffers from connection 
loss and retry. When disk full situation have been resolved, the zookeeper 
quorum can work normally again and all session won't time out. So HBase server 
won't crash due to session timeout, but the very high sleep time cause some 
module of RegionServer will still sleep for a long time(in our case, the 
balancer) before working.

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-16 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013463#comment-16013463
 ] 

Yu Li commented on HBASE-18058:
---

bq. long long recovery time after Zookeeper going down for some while and come 
back
Normally in this case RegionServer will crash due to zookeeper session timeout, 
similar like when RS full GC, right? Mind share the case in your scenario? How 
do you keep RS alive while zookeeper down for some while? Thanks. [~allan163]

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-16 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013460#comment-16013460
 ] 

Yu Li commented on HBASE-18058:
---

+1, {{RecoverableZookeeper}} is IA.Private so it's ok to change the constructor 
signature.

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch, 
> HBASE-18058-branch-1.v2.patch, HBASE-18058.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012474#comment-16012474
 ] 

Ted Yu commented on HBASE-18058:


lgtm

Mind attaching patch for master branch ?
Please trigger tests in hbase-server.

> Zookeeper retry sleep time should have a up limit
> -
>
> Key: HBASE-18058
> URL: https://issues.apache.org/jira/browse/HBASE-18058
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18058-branch-1.patch
>
>
> Now, in {{RecoverableZooKeeper}}, the retry backoff sleep time grow 
> exponentially, but it doesn't have any up limit. It directly lead to a long 
> long recovery time after Zookeeper going down for some while and come back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18058) Zookeeper retry sleep time should have a up limit

2017-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012423#comment-16012423
 ] 

Hadoop QA commented on HBASE-18058:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
36s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m 36s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
7s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 47s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:58c504e |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12868319/HBASE-18058-branch-1.patch
 |
| JIRA Issue | HBASE-18058 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 295bab27e0a2 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1 / d8ef495