[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594938#comment-13594938
 ] 

Jeffrey Zhong commented on HBASE-6772:
--------------------------------------

[~te...@apache.org] Thanks for commenting on this.

The sleepTime is for retrying purpose and only used when 
listChildrenAndWatchForNewChildren hit errors or splitLogZNode doesn't exist. 

In normal case, the following line set a watch on splitLogZNode and returns. 
Zookeeper will notify region servers to grab a task as soon as a new split task 
is saved into ZK.

{code}
        childrenPaths = ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
            this.watcher.splitLogZNode);
        if (childrenPaths != null) {
          return childrenPaths;
        }
{code} 
                
> Make the Distributed Split HDFS Location aware
> ----------------------------------------------
>
>                 Key: HBASE-6772
>                 URL: https://issues.apache.org/jira/browse/HBASE-6772
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: Jeffrey Zhong
>
> During a hlog split, each log file (a single hdfs block) is allocated to a 
> different region server. This region server reads the file and creates the 
> recovery edit files.
> The allocation to the region server is random. We could take into account the 
> locations of the log file to split:
> - the reads would be local, hence faster. This allows short circuit as well.
> - less network i/o used during a failure (and this is important)
> - we would be sure to read from a working datanode, hence we're sure we won't 
> have read errors. Read errors slow the split process a lot, as we often enter 
> the "timeouted world". 
> We need to limit the calls to the namenode however.
> Typical algo could be:
> - the master gets the locations of the hlog files
> - it writes it into ZK, if possible in one transaction (this way all the 
> tasks are visible alltogether, allowing some arbitrage by the region server).
> - when the regionserver receives the event, it checks for all logs and all 
> locations.
> - if there is a match, it takes it
> - if not it waits something like 0.2s (to give the time to other regionserver 
> to take it if the location matches), and take any remaining task.
> Drawbacks are:
> - a 0.2s delay added if there is no regionserver available on one of the 
> locations. It's likely possible to remove it with some extra synchronization.
> - Small increase in complexity and dependency to HDFS
> Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to