[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908314#comment-14908314 ] Hudson commented on HBASE-14370: SUCCESS: Integrated in HBase-1.3-IT #182 (See [https://builds.apache.org/job/HBase-1.3-IT/182/]) HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() (tedyu: rev 52188c5c4a55483bea229c77ab37b9bcbe9b3623) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.3.0, 0.98.15 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, > test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908299#comment-14908299 ] Hudson commented on HBASE-14370: FAILURE: Integrated in HBase-1.3 #204 (See [https://builds.apache.org/job/HBase-1.3/204/]) HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() (tedyu: rev 52188c5c4a55483bea229c77ab37b9bcbe9b3623) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.3.0, 0.98.15 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, > test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908054#comment-14908054 ] Ted Yu commented on HBASE-14370: {code} fht https://builds.apache.org/job/PreCommit-HBASE-Build/15739/consoleFull Fetching the console output from the URL Printing hanging tests Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization Printing Failing tests {code} >From >https://builds.apache.org/job/HBase-1.3/jdk=latest1.7,label=Hadoop/203/consoleFull > : {code} "main" prio=10 tid=0x7fa16c00a800 nid=0x4e8 waiting on condition [0x7fa173269000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1339) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1130) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1114) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:935) at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83) at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73) at org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos$AccessControlService$BlockingStub.grant(AccessControlProtos.java:10280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.grant(ProtobufUtil.java:2209) at org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:502) at org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:494) at org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:324) at org.apache.hadoop.hbase.security.access.SecureTestUtil.grantOnTable(SecureTestUtil.java:494) at org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization.setUp(TestWithDisabledAuthorization.java:203) {code} So it was not regression. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 0.98.15 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, > test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907965#comment-14907965 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12762313/14370-branch-1-v10.txt against branch-1 branch at commit a33adf2f0b050e9cf9330fd5ab7e200a7dd27d6d. ATTACHMENT ID: 12762313 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3777 checkstyle errors (more than the master's current 3776 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15739//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15739//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15739//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15739//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 0.98.15 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, > test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900218#comment-14900218 ] Anoop Sam John commented on HBASE-14370: I see this is already committed and only branch-1 commit was pending. Sorry for the delay in review. These are not so serious. Feel free to ignore as of now. If we want we can do later. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900195#comment-14900195 ] Anoop Sam John commented on HBASE-14370: You want it to be processed? After the close()? bq.Having refCount as static makes tracking outstanding TableAuthManager instance(s) easy But do we have a tracking mechanism? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900191#comment-14900191 ] Ted Yu commented on HBASE-14370: bq. Can't we have refCount info within TableAuthManager instance Having refCount as static makes tracking outstanding TableAuthManager instance(s) easy. bq. Do we need call shutdownNow() Calling shutdown() allows outstanding notifications to be processed. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900183#comment-14900183 ] Anoop Sam John commented on HBASE-14370: bq.private static Map refCount = new HashMap<>(); Can't we have refCount info within TableAuthManager instance? The 'release' method can be on instance rather than a static? {quote} public void close() { 107 executor.shutdown(); 108 } {quote} Do we need call shutdownNow() so that the running Runnable may be interrupted ? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876396#comment-14876396 ] Ted Yu commented on HBASE-14370: Waiting for green branch-1.x build before integrating to those branches. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741897#comment-14741897 ] Hudson commented on HBASE-14370: FAILURE: Integrated in HBase-0.98 #1120 (See [https://builds.apache.org/job/HBase-0.98/1120/]) HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() (tedyu: rev c5f3f68339f4a87bfa7823a8b8edad33a51b205c) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741866#comment-14741866 ] Hudson commented on HBASE-14370: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1073 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1073/]) HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() (tedyu: rev c5f3f68339f4a87bfa7823a8b8edad33a51b205c) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741857#comment-14741857 ] Hudson commented on HBASE-14370: FAILURE: Integrated in HBase-TRUNK #6800 (See [https://builds.apache.org/job/HBase-TRUNK/6800/]) HBASE-14370 Use separate thread for calling ZKPermissionWatcher#refreshNodes() (tedyu: rev dff5243c89544de8ed3127a7df5ec79cdab3373b) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ZKPermissionWatcher.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741503#comment-14741503 ] Ted Yu commented on HBASE-14370: For 0.98 QA run: {code} fht https://builds.apache.org/job/PreCommit-HBASE-Build/15567/console Fetching the console output from the URL Printing hanging tests Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat Printing Failing tests {code} The above hanging tests were not related to the patch. Planning to commit in the near future. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741495#comment-14741495 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755457/14370-0.98-v10.txt against 0.98 branch at commit c94d10952fe44f73096027cc9083cee993983940. ATTACHMENT ID: 12755457 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 22 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3869 checkstyle errors (more than the master's current 3868 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15567//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15567//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15567//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, > 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, > 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741104#comment-14741104 ] Ted Yu commented on HBASE-14370: QA environment issue: {code} testRegionCrossingHFileSplitRowBloom(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesUseSecurityEndPoint) Time elapsed: 1.308 sec <<< ERROR! org.apache.hadoop.ipc.RemoteException: unable to create new native thread {code} I don't think any of the above test failure is involved with enabling ACL. I am running the tests locally to make sure they pass. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, > 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741075#comment-14741075 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755409/14370-branch-1-v10.txt against branch-1 branch at commit c438052cc2280121727d4ae0883f0b76cad5816a. ATTACHMENT ID: 12755409 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3815 checkstyle errors (more than the master's current 3814 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence org.apache.hadoop.hbase.mapreduce.TestImportTsv {color:red}-1 core zombie tests{color}. There are 23 zombie test(s): at org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL.testVisibilityLabelsForUserWithNoAuths(TestVisibilityLabelsWithACL.java:203) at org.apache.hadoop.hbase.regionserver.TestMajorCompaction.testUserMajorCompactionRequest(TestMajorCompaction.java:429) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitWithRegionReplicas(TestSplitTransactionOnCluster.java:1024) at org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile.testLosingFileAfterScannerInit(TestCorruptedRegionStoreFile.java:172) at org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat.testInitTableSnapshotMapperJobConfig(TestTableSnapshotInputFormat.java:104) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testIncrementMultiThreads(TestAtomicOperation.java:166) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:286) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:260) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:193) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:116) at org.apache.hadoop.hbase.mapred.TestTableInputFormat.testInputFormat(TestTableInputFormat.java:353) at org.apache.hadoop.hbase.mapred.TestTableInputFormat.testDeprecatedExtensionOfTableInputFormatBase(TestTableInputFormat.java:334) at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testRefreshStoreFiles(TestRegionReplicas.java:250) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:286) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:260) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportWithTargetName(TestExportSnapshot.java:218) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15562//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15562//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15562//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15562//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740983#comment-14740983 ] Andrew Purtell commented on HBASE-14370: Review looks solid, no comments, please go ahead. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, > 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch, test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740640#comment-14740640 ] Ted Yu commented on HBASE-14370: Turns out HBASE-14378 covers TestAccessController\* already. Would wait for that to go in first. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, > test-acl3-branch-1.stack > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740594#comment-14740594 ] Ted Yu commented on HBASE-14370: TestAccessController3 seems to hang in branch-1. Here is part of stack trace: {code} "main" prio=5 tid=0x7ff52280 nid=0x1903 in Object.wait() [0x0001098f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0007c6a63358> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168) - locked <0x0007c6a63358> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73) at org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos$AccessControlService$BlockingStub.grant(AccessControlProtos.java:10280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.grant(ProtobufUtil.java:2181) at org.apache.hadoop.hbase.security.access.SecureTestUtil$2.call(SecureTestUtil.java:375) at org.apache.hadoop.hbase.security.access.SecureTestUtil$2.call(SecureTestUtil.java:367) at org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:332) at org.apache.hadoop.hbase.security.access.SecureTestUtil.grantGlobal(SecureTestUtil.java:367) at org.apache.hadoop.hbase.security.access.TestAccessController3.setUpTableAndUserPermissions(TestAccessController3.java:243) at org.apache.hadoop.hbase.security.access.TestAccessController3.setupBeforeClass(TestAccessController3.java:187) {code} > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, > 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740086#comment-14740086 ] Ted Yu commented on HBASE-14370: Rerun failed tests shown above and they passed locally. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, > 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740081#comment-14740081 ] Enis Soztutar commented on HBASE-14370: --- bq. Mainly wanted to say we don't need more threads but you fellas seem to be trying hard to avoid long-running thread that does nothing 99.999% of the time so that is good. The original motivation for this patch was due to HBASE-12635 having left a dynamic cluster with lots of regions with 60K acl definitions. The zk watcher thread will spend 3+ minutes just to do the refresh acls. Even with HBASE-12635 fixed, I think we should follow the practice of forking a thread to process the zk notifications. I did not do the perf analysis, but we have a cluster with 2000 tables which may make the refreshNodes() to be in the multi-seconds range. The ref counting is unfortunate, since there is no easy way to have an executor corresponding to a TableAuthManager since TableAuthManager itself is a static cache. We could have added the executor as one of the core RS threads, but that seems also a bit hacky. If there are suggestions there, I can try it out. Coming back to patch, Ted, I think I got the motivation for the preemption. v10 patch looks fine to me. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, > 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740055#comment-14740055 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755242/14370-v10.txt against master branch at commit bf26088d7be4386864148516b151dfb0a858c416. ATTACHMENT ID: 12755242 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWriteInternals(TestCacheOnWrite.java:274) at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite(TestCacheOnWrite.java:503) at org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager.test(TestReplicationWALReaderManager.java:181) at org.apache.hadoop.hbase.TestAcidGuarantees.testMobGetAtomicity(TestAcidGuarantees.java:392) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15540//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15540//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15540//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15540//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, > 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739607#comment-14739607 ] Ted Yu commented on HBASE-14370: The scenario I described was for refcounting verification where double decrement leads to region server abortion. w.r.t. missed recount decrement, I haven't found a scenario yet. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739583#comment-14739583 ] stack commented on HBASE-14370: --- Sounds fine. What about refcounting? Any way a recount decrement can be missed? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739520#comment-14739520 ] Ted Yu commented on HBASE-14370: Here is my test plan: subclass AccessController and override its stop() method. When the subclass stop() method calls super#stop() twice in a row, verify the abortion. Please let me know if other scenario should be tested. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739478#comment-14739478 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755178/14370-v8.txt against master branch at commit 0f0cdc5131913e1d82e9099c0a8e000c2ac97754. ATTACHMENT ID: 12755178 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testRefreshStoreFiles(TestRegionReplicas.java:237) at org.apache.camel.component.jetty.HttpEndpointUriEncodingIssueTest.testEndpointUriEncodingIssue(HttpEndpointUriEncodingIssueTest.java:32) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15530//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15530//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15530//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15530//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739320#comment-14739320 ] Ted Yu commented on HBASE-14370: {code} /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/test-framework/dev-support/test-patch.sh: line 861: mvn: command not found We're ok: there is no zombie test {code} How come the above error happened on H5 as well :-) Let me change the log from warn to fatal and try to come up with a unit test. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739291#comment-14739291 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755178/14370-v8.txt against master branch at commit 0f0cdc5131913e1d82e9099c0a8e000c2ac97754. ATTACHMENT ID: 12755178 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn post-site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15531//testReport/ Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15531//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15531//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739262#comment-14739262 ] stack commented on HBASE-14370: --- A WARN then it aborts? Tests? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, > hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739049#comment-14739049 ] Ted Yu commented on HBASE-14370: bq. just does log warn that "Something wrong with the TableAuthManager The following change in the same if block would take action on top of warning: {code} + instance.getZKPermissionWatcher().getWatcher().abort(msg, null); {code} bq. but then declares a ' private Runnable The private runnable allows subsequent nodeChildrenChanged event to preempt current processing of previous nodeChildrenChanged event. The rationale is that there is no need to continue processing potentially stale data. Would renaming the private runnable (e.g. nodeChildrenChangedRunnable) make the code more readable ? As of patch v7, the order of handling zk notifications is strictly the same as current formation. As stated earlier, the customer's use case constantly creates new tables. As of last Friday, there were ~2600 tables. I wouldn't be surprised if the table count reaches 3000. Efficiently handling zk notifications becomes important such that the notifications for region assignment are not blocked by the handling for ACL. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738999#comment-14738999 ] stack commented on HBASE-14370: --- It is hard to follow what is going on and I don't see a test that exercises this new complexity: i.e. processing loads of znodes asserting it is Doing The Right Thing. This patch includes refcounting without test at extremities (just does log warn that "Something wrong with the TableAuthManager reference counting: "...) Patch passes to executor anonymous new Runnable() throughout but then declares a ' private Runnable runnable = new Runnable() {'... Mainly wanted to say we don't need more threads but you fellas seem to be trying hard to avoid long-running thread that does nothing 99.999% of the time so that is good. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738954#comment-14738954 ] Ted Yu commented on HBASE-14370: The above test failures were not related to the patch - I have seen them in other QA runs. The tests pass locally. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738887#comment-14738887 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755121/14370-v7.txt against master branch at commit e770cf34174c8226eaf703c303ee3d8397c38242. ATTACHMENT ID: 12755121 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn post-site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportExport org.apache.hadoop.hbase.util.TestProcessBasedCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15525//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15525//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15525//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15525//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738188#comment-14738188 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755052/14370-v5.txt against master branch at commit a11bb2a933ad799546e7179fdf6ce75e3e22d44b. ATTACHMENT ID: 12755052 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15517//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15517//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15517//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15517//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738079#comment-14738079 ] Hadoop QA commented on HBASE-14370: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755024/hbase-14370_v4.patch against master branch at commit a11bb2a933ad799546e7179fdf6ce75e3e22d44b. ATTACHMENT ID: 12755024 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15516//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15516//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15516//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15516//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, > 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737817#comment-14737817 ] Ted Yu commented on HBASE-14370: {code} 778 if (ref-1 == 0) { 779 instance.close(); {code} instance should be removed from refCount, right ? {code} 68new DaemonThreadFactory("zkpermissionwatcher")); {code} Adding hyphen would make the thread name more readable. Submitting all actions to executor maintains the semantics of original implementation. If I cannot convince you that patch v1 doesn't introduce client visible difference in terms of ACL, we can go with patch v4. However, in a cluster with many tables where new tables are being created, the backing queue for the executor may have non-trivial length. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt, hbase-14370_v4.patch > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737789#comment-14737789 ] Enis Soztutar commented on HBASE-14370: --- Sorry, I have a hard time understanding the reasoning to go back to approach v1 from v3 patch. The possible race conditions I mention above are not specific to the Thread vs Executors, it is orthogonal to that. So v1 patch or a wait-signal version does not buy us anything compared to v3 patch. In the v3 patch, you are submitting a Runnable thread to the executor which runs indefinitely everytime node data changes. The lifecycle of ZKPermissionWatcher is different than AcccessController. I think what happens is that the AcccessController coprocessor will be stopped everytime a region is closed from the region server, while the ZKPermissionWatcher is cached via TableAuthManager. Let me attach a patch, to explain it better. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737507#comment-14737507 ] Ted Yu commented on HBASE-14370: Ran 14370-wait-nofity-v2.txt locally. Except for TestMasterFailoverWithProcedures , the other tests passed. TestMasterFailoverWithProcedures passed when run alone. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736981#comment-14736981 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754878/14370-wait-nofity-v2.txt against master branch at commit 27d3ab43efeabe2a0e1173858b6994b17b5c355b. ATTACHMENT ID: 12754878 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15501//testReport/ Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15501//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15501//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, > 14370-wait-nofity.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736300#comment-14736300 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754799/14370-wait-nofity.txt against master branch at commit e95358a7fc3f554dcbb351c8b7295cafc01e8c23. ATTACHMENT ID: 12754799 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15485//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15485//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15485//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15485//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735977#comment-14735977 ] Ted Yu commented on HBASE-14370: Clarification, based in patch v1, the only concern is busy waiting. If that's the case, I can continue refining patch v1. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735954#comment-14735954 ] Enis Soztutar commented on HBASE-14370: --- bq. Looking at patch v1, the thread is created once when ZKPermissionWatcher is created - not for every refresh call. Yes, it seems that the thread never exits. This is kind of like a busy wait, no? The thread looks for whether it should refresh every 2ms without a wait / signal. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735778#comment-14735778 ] Ted Yu commented on HBASE-14370: Looking at patch v1, the thread is created once when ZKPermissionWatcher is created - not for every refresh call. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735719#comment-14735719 ] Ted Yu commented on HBASE-14370: bq. one thread is refreshing the table auths, while the other is deleting that permission If table auth is put back by the refresher, it would be overwritten next time the table with same name is created. Before the table is created again, TableNotFoundException serves as the guard. What do you think ? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735703#comment-14735703 ] Enis Soztutar commented on HBASE-14370: --- bq. w.r.t. thread leak, have you seen the following code ? Ok, missed that. bq. Do you think tighter coordination is needed between the zk thread and the refresher thread ? In theory, there maybe a race where one thread is refreshing the table auths, while the other is deleting that permission since now, they will be executing in different threads. Maybe we can make every operation (nodeCreated,nodeDeleted,nodeDataChanged,nodeChildrenChanged) to execute from the executor. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735527#comment-14735527 ] Ted Yu commented on HBASE-14370: w.r.t. thread leak, have you seen the following code ? {code} + public void close() { +executor.shutdown(); {code} w.r.t. AtomicReference, the goal is for refresher thread to be interruptible. w.r.t. race condition between nodeChildrenChanged and nodeDeleted, if a table (namespace) is deleted, client would get TableNotFoundException (NamespaceNotFoundException) for future access - before ACL is checked. Do you think tighter coordination is needed between the zk thread and the refresher thread ? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735509#comment-14735509 ] Enis Soztutar commented on HBASE-14370: --- bq. See if patch v3 is better. Thanks Ted. The executor thread is not shut down, and will cause a thread leak. I was following the AtomicReference nodes, but could not get the full semantics. Did you introduce that to pass the list of nodes to the thread? Can we simplify by just passing the list of znodes directly to the thread? There may still be a race condition between nodeChildrenChanged (which now happens in the thread) and nodeDeleted, which still executes in the zk event thread, no? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt, 14370-v3.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735423#comment-14735423 ] Enis Soztutar commented on HBASE-14370: --- Forking a thread is expensive, so we do not use that pattern but instead use Executors with fixed thread pools. Plus, as mentioned above, we have to make sure that the refresh requests from different even notifications should execute in the same order that the zk notification comes from. Single threaded executor should be able to do that. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735237#comment-14735237 ] Ted Yu commented on HBASE-14370: Currently no class is using SingleThreadedExecutor What benefit does it give us ? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735224#comment-14735224 ] Enis Soztutar commented on HBASE-14370: --- For the patch, instead of forking a thread, can we have a SingleThreadedExecutor? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735217#comment-14735217 ] Enis Soztutar commented on HBASE-14370: --- It is standard practice to fork off a thread to do the work in the zk client since the zk notification is handled by a single thread. If a zk watcher runs for a very long time, other zk notifications are just blocked waiting for the event notification thread. We have to be careful though because the ordering of the execution for these should follow the zk notification order. In the case Ted mentioned, even with HBASE-12635 fixed, the {{ZKPermissionWatcher#refreshNodes()}} does a getData() for every 2000+ child znode which is quite costly. The other notifications (like assignment notifications) are waiting, and causing assignment to take a very long time. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733721#comment-14733721 ] Ted Yu commented on HBASE-14370: This change would benefit 0.98 and branch-1.x where we don't have proc-v2. Moving permission cache updates away from the zookeeper would be done in another JIRA. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732082#comment-14732082 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754332/14370-v1.txt against master branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b. ATTACHMENT ID: 12754332 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15438//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15438//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15438//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15438//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732072#comment-14732072 ] Andrew Purtell commented on HBASE-14370: I suppose this makes sense for where we don't have proc-v2, but otherwise moving permission cache updates away from the zookeeper hack to proc-v2 is the correct approach to fixing the problem you describe. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732066#comment-14732066 ] Ted Yu commented on HBASE-14370: Using the main zk thread for refreshing AuthManager, in my opinion, is not a scalable design. The customer cluster has well over 2000 tables. A workflow constantly creates new tables. The time for running ZKPermissionWatcher#refreshNodes() is not negligible. We should free the main zk thread for processing other zookeeper notifications. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732048#comment-14732048 ] Andrew Purtell commented on HBASE-14370: So this was an issue with 0.98.0 involving something fixed in 0.98.9? Why make any additional changes now? > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()
[ https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732029#comment-14732029 ] Hadoop QA commented on HBASE-14370: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754330/14370-v1.txt against master branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b. ATTACHMENT ID: 12754330 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The patch appears to cause mvn compile goal to fail with Hadoop version 2.4.0. Compilation errors resume: [ERROR] Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at META-INF/LICENSE.vm[line 1627, column 22] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project hbase-assembly: Error rendering velocity resource. Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at META-INF/LICENSE.vm[line 1627, column 22]: InvocationTargetException: Index: 0, Size: 0 -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hbase-assembly Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15437//console This message is automatically generated. > Use separate thread for calling ZKPermissionWatcher#refreshNodes() > -- > > Key: HBASE-14370 > URL: https://issues.apache.org/jira/browse/HBASE-14370 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14370-v1.txt > > > I came off a support case (0.98.0) where main zk thread was seen doing the > following: > {code} > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135) > at > org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > {code} > There were 62000 nodes under /acl due to lack of fix from HBASE-12635, > leading to slowness in table creation because zk notification for region > offline was blocked by the above. > The attached patch separates refreshNodes() call into its own thread. > Thanks to Enis and Devaraj for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)