[jira] [Commented] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time

Hadoop QA (JIRA) Fri, 25 Jan 2013 20:47:16 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563353#comment-13563353
 ]


Hadoop QA commented on HBASE-7670:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566484/HBASE-7670.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestZooKeeper
                  org.apache.hadoop.hbase.regionserver.TestPriorityRpc

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4194//console

This message is automatically generated.
                
> Synchronized operation in CatalogTracker would block handling ZK Event for 
> long time
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-7670
>                 URL: https://issues.apache.org/jira/browse/HBASE-7670
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.4
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7670.patch
>
>
> We found ZK event not be watched by master for a  long time in our testing.
> It seems one ZK-Event-Handle thread block it.
> Attaching some logs on master
> {code}
> 2013-01-16 22:18:55,667 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, 
> 2013-01-16 22:18:56,270 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, 
> ...
> 2013-01-16 23:55:33,259 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=100, exceptions:
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:676)
>         at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:247)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:349)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:289)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:276)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:424)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:489)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:451)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:289)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-01-16 23:55:33,261 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Attempted to handle region 
> transition for server but server is not online
> {code}
> Between 2013-01-16 22:18:56 and 2013-01-16 23:55:33, there is no any logs 
> about handling ZK Event.
> {code}
> this.metaNodeTracker = new MetaNodeTracker(zookeeper, throwableAborter) {
>       public void nodeDeleted(String path) {
>         if (!path.equals(node)) return;
>         ct.resetMetaLocation();
>       }
>     }
> public void resetMetaLocation() {
>     LOG.debug("Current cached META location, " + metaLocation +
>       ", is not valid, resetting");
>     synchronized(this.metaAvailable) {
>       this.metaAvailable.set(false);
>       this.metaAvailable.notifyAll();
>     }
>   }
> private AdminProtocol getMetaServerConnection(){
> synchronized (metaAvailable){
> ...
> ServerName newLocation = MetaReader.getMetaRegionLocation(this);
> ...
> }
> }
> {code}
> From the above code, we would found that nodeDeleted() would wait 
> synchronized (metaAvailable) until MetaReader.getMetaRegionLocation(this) 
> done,
> however, getMetaRegionLocation() could be retrying for a long time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time

Reply via email to