[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14370:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Why is this open again? Commit, then resolve. Don't commit and not resolve. 
Thanks for helping RMs keep JIRA clean!


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Status: Patch Available  (was: Reopened)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Fix Version/s: 1.3.0

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-branch-1-v10.txt

Re-attaching for test run.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14370:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

This has been committed, the issue should be resolved as Fixed.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14370:
---
Fix Version/s: (was: 1.1.3)
   (was: 1.0.3)
   (was: 1.3.0)
   (was: 1.2.0)

Update fix versions when committed to other branches

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.15
>
> Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 
> 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Status: Patch Available  (was: Open)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 
> 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-branch-1-v10.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 
> 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: test-acl3-branch-1.stack

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.1.3
   1.0.3
   0.98.15
   1.3.0
   1.2.0
   2.0.0

[~apurtell]:
I am preparing backport to 0.98 branch.

Let me know if you have comments.

Thanks

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 
> 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch, test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v7.txt

Patch v7 adds preemption for nodeChildrenChanged events on top of patch v5.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v8.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: (was: 14370-v8.txt)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v8.txt

Patch v8 moves the private runnable inside nodeChildrenChanged() with some 
comment explaining what it does.
Hopefully this makes code more readble.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v10.txt

In patch v10, added test where double ref decrement leads to region server 
abort.
Also verified that after tables are deleted, total reference count comes down 
to 0 in normal operation.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Status: Open  (was: Patch Available)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v10.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-v7.txt, 14370-v8.txt, 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, 
> hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-branch-1-v10.txt

Patch for branch-1.


> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-wait-nofity-v2.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14370:
--
Attachment: hbase-14370_v4.patch

Ted, can you check the v4 patch to see my points above. I did not run any tests 
on it though. 

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity-v2.txt, 
> 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-09 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v5.txt

Patch v5 adds call to watcher.abort() in TableAuthManager#release() when ref 
counting assertion fails.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: (was: 14370-v3.txt)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v3.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v3.txt

See if patch v3 is better.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-wait-nofity.txt

Please take a look at 14370-wait-nofity.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-wait-nofity.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Issue Type: Bug  (was: Improvement)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v1.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Status: Patch Available  (was: Open)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: 14370-v1.txt

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14370) Use separate thread for calling ZKPermissionWatcher#refreshNodes()

2015-09-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14370:
---
Attachment: (was: 14370-v1.txt)

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> --
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.98.0
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)