[ https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101639#comment-17101639 ]
Hudson commented on HBASE-24211: -------------------------------- Results for branch master [build #1719 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1719/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1719/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1475//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1719/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://builds.apache.org/job/HBase%20Nightly/job/master/1719/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Create table is slow in large cluster when AccessController is enabled. > ----------------------------------------------------------------------- > > Key: HBASE-24211 > URL: https://issues.apache.org/jira/browse/HBASE-24211 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.6, master, 2.2.4 > Reporter: Mohammad Arshad > Assignee: Mohammad Arshad > Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0 > > > *Problem:* > In HBase 1.3.x large, performance test, cluster (100 RS, 60k tables, 600k > regions) a simple table creation takes around 150 seconds. The time taken > varies but still takes lot of time. > *Analysis:* > 1. When HBase creates a table , it calls AssignmentManager#assign(final > ServerName destination, final List<HRegionInfo> regions) > In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, > destination), and waits in below code loop for 2 minutes. > {code:java} > if (useZKForAssignment) { > // Wait until all unassigned nodes have been put up and watchers > set. > int total = states.size(); > for (int oldCounter = 0; !server.isStopped();) { > int count = counter.get(); > if (oldCounter != count) { > LOG.debug(destination.toString() + " unassigned znodes=" + > count + > " of total=" + total + "; oldCounter=" + oldCounter); > oldCounter = count; > } > if (count >= total) break; > Thread.sleep(5); > } > } > {code} > 2. asyncSetOfflineInZooKeeper creates a znode under > /hbase/region-in-transition/ and calls exist to ensure that znode is created. > This is simple operation should not take much time. Then where the time it > taken!!! > 3. ZooKeeper client API process watcher notification and async API response > through a queue one by one. > If there is a delay in any watcher/response processing by the client, in > this case HBase, all other response processing is delayed. Then it appears as > if API call has taken more time. > Same thing happen in this issue. > Watcher processing for znode creation under /hbase/acl took most of the time > and delayed /hbase/region-in-transition/region znode creation processing. > This is why wait in loop was too long. > 4. Watcher processing for znode creation under hbase/acl/ calls > ZKPermissionWatcher#nodeChildrenChanged, which internally calls > ZKUtil.getChildDataAndWatchForNewChildren > *which calls ZooKeeper's getData API, in this use case, 60k times which > takes most of the time.* > *Solutions:* > Move getChildDataAndWatchForNewChildren call into the async code block in > ZKPermissionWatcher#nodeChildrenChanged. > -- This message was sent by Atlassian Jira (v8.3.4#803005)