[jira] [Updated] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time

Ted Yu (JIRA) Fri, 25 Jan 2013 19:45:37 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Yu updated HBASE-7670:
--------------------------

    Status: Patch Available  (was: Open)
    
> Synchronized operation in CatalogTracker would block handling ZK Event for 
> long time
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-7670
>                 URL: https://issues.apache.org/jira/browse/HBASE-7670
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.4
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-7670.patch
>
>
> We found ZK event not be watched by master for a  long time in our testing.
> It seems one ZK-Event-Handle thread block it.
> Attaching some logs on master
> {code}
> 2013-01-16 22:18:55,667 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, 
> 2013-01-16 22:18:56,270 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, 
> ...
> 2013-01-16 23:55:33,259 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
> Retrying
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=100, exceptions:
>         at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:183)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:676)
>         at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:247)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:349)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:289)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:276)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:424)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:489)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:451)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:289)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-01-16 23:55:33,261 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Attempted to handle region 
> transition for server but server is not online
> {code}
> Between 2013-01-16 22:18:56 and 2013-01-16 23:55:33, there is no any logs 
> about handling ZK Event.
> {code}
> this.metaNodeTracker = new MetaNodeTracker(zookeeper, throwableAborter) {
>       public void nodeDeleted(String path) {
>         if (!path.equals(node)) return;
>         ct.resetMetaLocation();
>       }
>     }
> public void resetMetaLocation() {
>     LOG.debug("Current cached META location, " + metaLocation +
>       ", is not valid, resetting");
>     synchronized(this.metaAvailable) {
>       this.metaAvailable.set(false);
>       this.metaAvailable.notifyAll();
>     }
>   }
> private AdminProtocol getMetaServerConnection(){
> synchronized (metaAvailable){
> ...
> ServerName newLocation = MetaReader.getMetaRegionLocation(this);
> ...
> }
> }
> {code}
> From the above code, we would found that nodeDeleted() would wait 
> synchronized (metaAvailable) until MetaReader.getMetaRegionLocation(this) 
> done,
> however, getMetaRegionLocation() could be retrying for a long time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7670) Synchronized operation in CatalogTracker would block handling ZK Event for long time

Reply via email to