[ https://issues.apache.org/jira/browse/HBASE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563353#comment-13563353 ]
Hadoop QA commented on HBASE-7670: ---------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566484/HBASE-7670.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.regionserver.TestPriorityRpc Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4194//console This message is automatically generated. > Synchronized operation in CatalogTracker would block handling ZK Event for > long time > ------------------------------------------------------------------------------------ > > Key: HBASE-7670 > URL: https://issues.apache.org/jira/browse/HBASE-7670 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.4 > Reporter: chunhui shen > Assignee: chunhui shen > Priority: Critical > Fix For: 0.96.0 > > Attachments: HBASE-7670.patch > > > We found ZK event not be watched by master for a long time in our testing. > It seems one ZK-Event-Handle thread block it. > Attaching some logs on master > {code} > 2013-01-16 22:18:55,667 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, > 2013-01-16 22:18:56,270 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, > ... > 2013-01-16 23:55:33,259 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: > Retrying > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=100, exceptions: > at > org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:676) > at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:247) > at > org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:349) > at > org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:289) > at > org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:276) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:424) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:489) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:451) > at > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:289) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-01-16 23:55:33,261 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Attempted to handle region > transition for server but server is not online > {code} > Between 2013-01-16 22:18:56 and 2013-01-16 23:55:33, there is no any logs > about handling ZK Event. > {code} > this.metaNodeTracker = new MetaNodeTracker(zookeeper, throwableAborter) { > public void nodeDeleted(String path) { > if (!path.equals(node)) return; > ct.resetMetaLocation(); > } > } > public void resetMetaLocation() { > LOG.debug("Current cached META location, " + metaLocation + > ", is not valid, resetting"); > synchronized(this.metaAvailable) { > this.metaAvailable.set(false); > this.metaAvailable.notifyAll(); > } > } > private AdminProtocol getMetaServerConnection(){ > synchronized (metaAvailable){ > ... > ServerName newLocation = MetaReader.getMetaRegionLocation(this); > ... > } > } > {code} > From the above code, we would found that nodeDeleted() would wait > synchronized (metaAvailable) until MetaReader.getMetaRegionLocation(this) > done, > however, getMetaRegionLocation() could be retrying for a long time -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira