[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908314#comment-14908314
 ] 

Hudson commented on HBASE-14370:


SUCCESS: Integrated in HBase-1.3-IT #182 (See 
[https://builds.apache.org/job/HBase-1.3-IT/182/])
HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() 
(tedyu: rev 52188c5c4a55483bea229c77ab37b9bcbe9b3623)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908299#comment-14908299
 ] 

Hudson commented on HBASE-14370:


FAILURE: Integrated in HBase-1.3 #204 (See 
[https://builds.apache.org/job/HBase-1.3/204/])
HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() 
(tedyu: rev 52188c5c4a55483bea229c77ab37b9bcbe9b3623)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908054#comment-14908054
 ] 

Ted Yu commented on HBASE-14370:


{code}
fht https://builds.apache.org/job/PreCommit-HBASE-Build/15739/consoleFull
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Printing Failing tests
{code}
>From 
>https://builds.apache.org/job/HBase-1.3/jdk=latest1.7,label=Hadoop/203/consoleFull
> :
{code}
"main" prio=10 tid=0x7fa16c00a800 nid=0x4e8 waiting on condition 
[0x7fa173269000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1339)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1130)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1114)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:935)
at 
org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
at 
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
at 
org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos$AccessControlService$BlockingStub.grant(AccessControlProtos.java:10280)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.grant(ProtobufUtil.java:2209)
at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:502)
at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:494)
at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:324)
at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.grantOnTable(SecureTestUtil.java:494)
at 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization.setUp(TestWithDisabledAuthorization.java:203)
{code}
So it was not regression.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907965#comment-14907965
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12762313/14370-branch-1-v10.txt
  against branch-1 branch at commit a33adf2f0b050e9cf9330fd5ab7e200a7dd27d6d.
  ATTACHMENT ID: 12762313

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3777 checkstyle errors (more than the master's current 3776 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15739//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15739//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15739//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15739//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900218#comment-14900218
 ] 

Anoop Sam John commented on HBASE-14370:


I see this is already committed and only branch-1 commit was pending.  Sorry 
for the delay in review.
These are not so serious. Feel free to ignore as of now. If we want we can do 
later.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900195#comment-14900195
 ] 

Anoop Sam John commented on HBASE-14370:


You want it to be processed? After the close()?
bq.Having refCount as static makes tracking outstanding TableAuthManager 
instance(s) easy
 But do we have a tracking mechanism?  

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900191#comment-14900191
 ] 

Ted Yu commented on HBASE-14370:


bq. Can't we have refCount info within TableAuthManager instance

Having refCount as static makes tracking outstanding TableAuthManager 
instance(s) easy.

bq. Do we need call shutdownNow() 

Calling shutdown() allows outstanding notifications to be processed.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-20 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900183#comment-14900183
 ] 

Anoop Sam John commented on HBASE-14370:


bq.private static Map refCount = new HashMap<>();
Can't we have refCount info within TableAuthManager  instance? The 'release' 
method can be on instance rather than a static?

{quote}
public void close() {
107 executor.shutdown();
108   }
{quote}
Do we need call shutdownNow() so that the running Runnable may be interrupted ?


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876396#comment-14876396
 ] 

Ted Yu commented on HBASE-14370:


Waiting for green branch-1.x build before integrating to those branches.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741897#comment-14741897
 ] 

Hudson commented on HBASE-14370:


FAILURE: Integrated in HBase-0.98 #1120 (See 
[https://builds.apache.org/job/HBase-0.98/1120/])
HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() 
(tedyu: rev c5f3f68339f4a87bfa7823a8b8edad33a51b205c)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741866#comment-14741866
 ] 

Hudson commented on HBASE-14370:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1073 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1073/])
HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() 
(tedyu: rev c5f3f68339f4a87bfa7823a8b8edad33a51b205c)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741857#comment-14741857
 ] 

Hudson commented on HBASE-14370:


FAILURE: Integrated in HBase-TRUNK #6800 (See 
[https://builds.apache.org/job/HBase-TRUNK/6800/])
HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() 
(tedyu: rev dff5243c89544de8ed3127a7df5ec79cdab3373b)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741503#comment-14741503
 ] 

Ted Yu commented on HBASE-14370:


For 0.98 QA run:
{code}
fht https://builds.apache.org/job/PreCommit-HBASE-Build/15567/console
Fetching the console output from the URL
Printing hanging tests
Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1
Hanging test : org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
Printing Failing tests
{code}
The above hanging tests were not related to the patch.

Planning to commit in the near future.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741495#comment-14741495
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755457/14370-0.98-v10.txt
  against 0.98 branch at commit c94d10952fe44f73096027cc9083cee993983940.
  ATTACHMENT ID: 12755457

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
22 warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3869 checkstyle errors (more than the master's current 3868 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15567//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15567//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741104#comment-14741104
 ] 

Ted Yu commented on HBASE-14370:


QA environment issue:
{code}
testRegionCrossingHFileSplitRowBloom(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesUseSecurityEndPoint)
  Time elapsed: 1.308 sec  <<< ERROR!
org.apache.hadoop.ipc.RemoteException: unable to create new native thread
{code}
I don't think any of the above test failure is involved with enabling ACL.

I am running the tests locally to make sure they pass.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 
> 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741075#comment-14741075
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12755409/14370-branch-1-v10.txt
  against branch-1 branch at commit c438052cc2280121727d4ae0883f0b76cad5816a.
  ATTACHMENT ID: 12755409

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3815 checkstyle errors (more than the master's current 3814 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

 {color:red}-1 core zombie tests{color}.  There are 23 zombie test(s):  
at 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL.testVisibilityLabelsForUserWithNoAuths(TestVisibilityLabelsWithACL.java:203)
at 
org.apache.hadoop.hbase.regionserver.TestMajorCompaction.testUserMajorCompactionRequest(TestMajorCompaction.java:429)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitWithRegionReplicas(TestSplitTransactionOnCluster.java:1024)
at 
org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile.testLosingFileAfterScannerInit(TestCorruptedRegionStoreFile.java:172)
at 
org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat.testInitTableSnapshotMapperJobConfig(TestTableSnapshotInputFormat.java:104)
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testIncrementMultiThreads(TestAtomicOperation.java:166)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:286)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:260)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:193)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:116)
at 
org.apache.hadoop.hbase.mapred.TestTableInputFormat.testInputFormat(TestTableInputFormat.java:353)
at 
org.apache.hadoop.hbase.mapred.TestTableInputFormat.testDeprecatedExtensionOfTableInputFormatBase(TestTableInputFormat.java:334)
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testRefreshStoreFiles(TestRegionReplicas.java:250)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:286)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:260)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportWithTargetName(TestExportSnapshot.java:218)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15562//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15562//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15562//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15562//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE

[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740983#comment-14740983
 ] 

Andrew Purtell commented on HBASE-14370:


Review looks solid, no comments, please go ahead.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 
> 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740640#comment-14740640
 ] 

Ted Yu commented on HBASE-14370:


Turns out HBASE-14378 covers TestAccessController\* already.

Would wait for that to go in first.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740594#comment-14740594
 ] 

Ted Yu commented on HBASE-14370:


TestAccessController3 seems to hang in branch-1.
Here is part of stack trace:
{code}
"main" prio=5 tid=0x7ff52280 nid=0x1903 in Object.wait() 
[0x0001098f7000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  - waiting on <0x0007c6a63358> (a 
java.util.concurrent.atomic.AtomicBoolean)
  at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
  - locked <0x0007c6a63358> (a java.util.concurrent.atomic.AtomicBoolean)
  at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
  at 
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
  at 
org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos$AccessControlService$BlockingStub.grant(AccessControlProtos.java:10280)
  at org.apache.hadoop.hbase.protobuf.ProtobufUtil.grant(ProtobufUtil.java:2181)
  at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$2.call(SecureTestUtil.java:375)
  at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$2.call(SecureTestUtil.java:367)
  at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:332)
  at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.grantGlobal(SecureTestUtil.java:367)
  at 
org.apache.hadoop.hbase.security.access.TestAccessController3.setUpTableAndUserPermissions(TestAccessController3.java:243)
  at 
org.apache.hadoop.hbase.security.access.TestAccessController3.setupBeforeClass(TestAccessController3.java:187)
{code}

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740086#comment-14740086
 ] 

Ted Yu commented on HBASE-14370:


Rerun failed tests shown above and they passed locally.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740081#comment-14740081
 ] 

Enis Soztutar commented on HBASE-14370:
---

bq. Mainly wanted to say we don't need more threads but you fellas seem to be 
trying hard to avoid long-running thread that does nothing 99.999% 
of the time so that is good.
The original motivation for this patch was due to HBASE-12635 having left a 
dynamic cluster with lots of regions with 60K acl definitions. The zk watcher 
thread will spend 3+ minutes just to do the refresh acls. Even with HBASE-12635 
fixed, I think we should follow the practice of forking a thread to process the 
zk notifications. I did not do the perf analysis, but we have a cluster with 
2000 tables which may make the refreshNodes() to be in the multi-seconds range. 

The ref counting is unfortunate, since there is no easy way to have an executor 
corresponding to a TableAuthManager since TableAuthManager itself is a static 
cache. We could have added the executor as one of the core RS threads, but that 
seems also a bit hacky. If there are suggestions there, I can try it out. 

Coming back to patch, Ted, I think I got the motivation for the preemption. v10 
patch looks fine to me. 


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740055#comment-14740055
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755242/14370-v10.txt
  against master branch at commit bf26088d7be4386864148516b151dfb0a858c416.
  ATTACHMENT ID: 12755242

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWriteInternals(TestCacheOnWrite.java:274)
at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite(TestCacheOnWrite.java:503)
at 
org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager.test(TestReplicationWALReaderManager.java:181)
at 
org.apache.hadoop.hbase.TestAcidGuarantees.testMobGetAtomicity(TestAcidGuarantees.java:392)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15540//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15540//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15540//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15540//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739607#comment-14739607
 ] 

Ted Yu commented on HBASE-14370:


The scenario I described was for refcounting verification where double 
decrement leads to region server abortion.
w.r.t. missed recount decrement, I haven't found a scenario yet.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739583#comment-14739583
 ] 

stack commented on HBASE-14370:
---

Sounds fine. What about refcounting? Any way a recount decrement can be missed?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739520#comment-14739520
 ] 

Ted Yu commented on HBASE-14370:


Here is my test plan: subclass AccessController and override its stop() method.
When the subclass stop() method calls super#stop() twice in a row, verify the 
abortion.

Please let me know if other scenario should be tested.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739478#comment-14739478
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755178/14370-v8.txt
  against master branch at commit 0f0cdc5131913e1d82e9099c0a8e000c2ac97754.
  ATTACHMENT ID: 12755178

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testRefreshStoreFiles(TestRegionReplicas.java:237)
at 
org.apache.camel.component.jetty.HttpEndpointUriEncodingIssueTest.testEndpointUriEncodingIssue(HttpEndpointUriEncodingIssueTest.java:32)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15530//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15530//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15530//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15530//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739320#comment-14739320
 ] 

Ted Yu commented on HBASE-14370:


{code}
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/test-framework/dev-support/test-patch.sh:
 line 861: mvn: command not found
We're ok: there is no zombie test
{code}
How come the above error happened on H5 as well :-)

Let me change the log from warn to fatal and try to come up with a unit test.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739291#comment-14739291
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755178/14370-v8.txt
  against master branch at commit 0f0cdc5131913e1d82e9099c0a8e000c2ac97754.
  ATTACHMENT ID: 12755178

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15531//testReport/
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15531//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15531//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739262#comment-14739262
 ] 

stack commented on HBASE-14370:
---

A WARN then it aborts?

Tests?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739049#comment-14739049
 ] 

Ted Yu commented on HBASE-14370:


bq. just does log warn that "Something wrong with the TableAuthManager 

The following change in the same if block would take action on top of warning:
{code}
+  instance.getZKPermissionWatcher().getWatcher().abort(msg, null);
{code}
bq. but then declares a ' private Runnable

The private runnable allows subsequent nodeChildrenChanged event to preempt 
current processing of previous nodeChildrenChanged event. The rationale is that 
there is no need to continue processing potentially stale data.
Would renaming the private runnable (e.g. nodeChildrenChangedRunnable) make the 
code more readable ?

As of patch v7, the order of handling zk notifications is strictly the same as 
current formation.

As stated earlier, the customer's use case constantly creates new tables. As of 
last Friday, there were ~2600 tables. I wouldn't be surprised if the table 
count reaches 3000.
Efficiently handling zk notifications becomes important such that the 
notifications for region assignment are not blocked by the handling for ACL.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738999#comment-14738999
 ] 

stack commented on HBASE-14370:
---

It is hard to follow what is going on and I don't see a test that exercises 
this new complexity: i.e. processing loads of znodes asserting it is Doing The 
Right Thing. This patch includes refcounting without test at extremities (just 
does log warn that "Something wrong with the TableAuthManager reference 
counting: "...)  Patch passes to executor anonymous new Runnable() throughout 
but then declares a  '  private Runnable runnable = new Runnable() {'... 

Mainly wanted to say we don't need more threads but you fellas seem to be 
trying hard to avoid long-running thread that does nothing 99.999% 
of the time so that is good.



> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738954#comment-14738954
 ] 

Ted Yu commented on HBASE-14370:


The above test failures were not related to the patch - I have seen them in 
other QA runs.
The tests pass locally.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738887#comment-14738887
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755121/14370-v7.txt
  against master branch at commit e770cf34174c8226eaf703c303ee3d8397c38242.
  ATTACHMENT ID: 12755121

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportExport
  org.apache.hadoop.hbase.util.TestProcessBasedCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15525//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15525//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15525//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15525//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738188#comment-14738188
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755052/14370-v5.txt
  against master branch at commit a11bb2a933ad799546e7179fdf6ce75e3e22d44b.
  ATTACHMENT ID: 12755052

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15517//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15517//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15517//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15517//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738079#comment-14738079
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12755024/hbase-14370_v4.patch
  against master branch at commit a11bb2a933ad799546e7179fdf6ce75e3e22d44b.
  ATTACHMENT ID: 12755024

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15516//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15516//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15516//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15516//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737817#comment-14737817
 ] 

Ted Yu commented on HBASE-14370:


{code}
778   if (ref-1 == 0) {
779 instance.close();
{code}
instance should be removed from refCount, right ?
{code}
68new DaemonThreadFactory("zkpermissionwatcher"));
{code}
Adding hyphen would make the thread name more readable.

Submitting all actions to executor maintains the semantics of original 
implementation.

If I cannot convince you that patch v1 doesn't introduce client visible 
difference in terms of ACL, we can go with patch v4. However, in a cluster with 
many tables where new tables are being created, the backing queue for the 
executor may have non-trivial length.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737789#comment-14737789
 ] 

Enis Soztutar commented on HBASE-14370:
---

Sorry, I have a hard time understanding the reasoning to go back to approach v1 
from v3 patch. The possible race conditions I mention above are not specific to 
the Thread vs Executors, it is orthogonal to that. So v1 patch or a wait-signal 
version does not buy us anything compared to v3 patch.  

In the v3 patch, you are submitting a Runnable thread to the executor which 
runs indefinitely everytime node data changes. The lifecycle of 
ZKPermissionWatcher is different than AcccessController. I think what happens 
is that the AcccessController coprocessor will be stopped everytime a region is 
closed from the region server, while the ZKPermissionWatcher is cached via 
TableAuthManager. 

Let me attach a patch, to explain it better. 


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737507#comment-14737507
 ] 

Ted Yu commented on HBASE-14370:


Ran 14370-wait-nofity-v2.txt locally. Except for 
TestMasterFailoverWithProcedures , the other tests passed.
TestMasterFailoverWithProcedures passed when run alone.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736981#comment-14736981
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12754878/14370-wait-nofity-v2.txt
  against master branch at commit 27d3ab43efeabe2a0e1173858b6994b17b5c355b.
  ATTACHMENT ID: 12754878

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15501//testReport/
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15501//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15501//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736300#comment-14736300
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754799/14370-wait-nofity.txt
  against master branch at commit e95358a7fc3f554dcbb351c8b7295cafc01e8c23.
  ATTACHMENT ID: 12754799

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15485//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15485//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15485//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15485//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735977#comment-14735977
 ] 

Ted Yu commented on HBASE-14370:


Clarification, based in patch v1, the only concern is busy waiting.
If that's the case, I can continue refining patch v1.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735954#comment-14735954
 ] 

Enis Soztutar commented on HBASE-14370:
---

bq. Looking at patch v1, the thread is created once when ZKPermissionWatcher is 
created - not for every refresh call.
Yes, it seems that the thread never exits. This is kind of like a busy wait, 
no? The thread looks for whether it should refresh every 2ms without a wait / 
signal. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735778#comment-14735778
 ] 

Ted Yu commented on HBASE-14370:


Looking at patch v1, the thread is created once when ZKPermissionWatcher is 
created - not for every refresh call.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735719#comment-14735719
 ] 

Ted Yu commented on HBASE-14370:


bq. one thread is refreshing the table auths, while the other is deleting that 
permission

If table auth is put back by the refresher, it would be overwritten next time 
the table with same name is created.
Before the table is created again, TableNotFoundException serves as the guard.

What do you think ?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735703#comment-14735703
 ] 

Enis Soztutar commented on HBASE-14370:
---

bq. w.r.t. thread leak, have you seen the following code ?
Ok, missed that. 
bq. Do you think tighter coordination is needed between the zk thread and the 
refresher thread ?
In theory, there maybe a race where one thread is refreshing the table auths, 
while the other is deleting that permission since now, they will be executing 
in different threads. Maybe we can make every operation 
(nodeCreated,nodeDeleted,nodeDataChanged,nodeChildrenChanged) to execute from 
the executor.  

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735527#comment-14735527
 ] 

Ted Yu commented on HBASE-14370:


w.r.t. thread leak, have you seen the following code ?
{code}
+  public void close() {
+executor.shutdown();
{code}
w.r.t. AtomicReference, the goal is for refresher thread to be interruptible.

w.r.t. race condition between nodeChildrenChanged and nodeDeleted, if a table 
(namespace) is deleted, client would get TableNotFoundException 
(NamespaceNotFoundException) for future access - before ACL is checked.
Do you think tighter coordination is needed between the zk thread and the 
refresher thread ?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735509#comment-14735509
 ] 

Enis Soztutar commented on HBASE-14370:
---

bq. See if patch v3 is better.
Thanks Ted. 

The executor thread is not shut down, and will cause a thread leak. 
I was following the AtomicReference nodes, but could not get the full 
semantics. Did you introduce that to pass the list of nodes to the thread? Can 
we simplify by just passing the list of znodes directly to the thread? There 
may still be a race condition between nodeChildrenChanged (which now happens in 
the thread) and nodeDeleted, which still executes in the zk event thread, no?  

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735423#comment-14735423
 ] 

Enis Soztutar commented on HBASE-14370:
---

Forking a thread is expensive, so we do not use that pattern but instead use 
Executors with fixed thread pools. Plus, as mentioned above, we have to make 
sure that the refresh requests from different even notifications should execute 
in the same order that the zk notification comes from. Single threaded executor 
should be able to do that. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735237#comment-14735237
 ] 

Ted Yu commented on HBASE-14370:


Currently no class is using SingleThreadedExecutor

What benefit does it give us ?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735224#comment-14735224
 ] 

Enis Soztutar commented on HBASE-14370:
---

For the patch, instead of forking a thread, can we have a 
SingleThreadedExecutor?

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735217#comment-14735217
 ] 

Enis Soztutar commented on HBASE-14370:
---

It is standard practice to fork off a thread to do the work in the zk client 
since the zk notification is handled by a single thread. If a zk watcher runs 
for a very long time, other zk notifications are just blocked waiting for the 
event notification thread. We have to be careful though because the ordering of 
the execution for these should follow the zk notification order. 

In the case Ted mentioned, even with HBASE-12635 fixed, the 
{{ZKPermissionWatcher#refreshNodes()}} does a getData() for every 2000+ child 
znode which is quite costly. The other notifications (like assignment 
notifications) are waiting, and causing assignment to take a very long time. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733721#comment-14733721
 ] 

Ted Yu commented on HBASE-14370:


This change would benefit 0.98 and branch-1.x where we don't have proc-v2.

Moving permission cache updates away from the zookeeper would be done in 
another JIRA.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732082#comment-14732082
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754332/14370-v1.txt
  against master branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754332

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15438//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15438//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15438//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15438//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732072#comment-14732072
 ] 

Andrew Purtell commented on HBASE-14370:


I suppose this makes sense for where we don't have proc-v2, but otherwise 
moving permission cache updates away from the zookeeper hack to proc-v2 is the 
correct approach to fixing the problem you describe. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732066#comment-14732066
 ] 

Ted Yu commented on HBASE-14370:


Using the main zk thread for refreshing AuthManager, in my opinion, is not a 
scalable design.

The customer cluster has well over 2000 tables. A workflow constantly creates 
new tables.
The time for running ZKPermissionWatcher#refreshNodes() is not negligible.

We should free the main zk thread for processing other zookeeper notifications.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732048#comment-14732048
 ] 

Andrew Purtell commented on HBASE-14370:


So this was an issue with 0.98.0 involving something fixed in 0.98.9? Why make 
any additional changes now? 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732029#comment-14732029
 ] 

Hadoop QA commented on HBASE-14370:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754330/14370-v1.txt
  against master branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b.
  ATTACHMENT ID: 12754330

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] Error invoking method 'get(java.lang.Integer)' in 
java.util.ArrayList at META-INF/LICENSE.vm[line 1627, column 22]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on 
project hbase-assembly: Error rendering velocity resource. Error invoking 
method 'get(java.lang.Integer)' in java.util.ArrayList at 
META-INF/LICENSE.vm[line 1627, column 22]: InvocationTargetException: Index: 0, 
Size: 0 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-assembly


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15437//console

This message is automatically generated.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)