chooseTargets method in FSNamesystem is very inefficient
--------------------------------------------------------

                 Key: HADOOP-725
                 URL: http://issues.apache.org/jira/browse/HADOOP-725
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.8.0
         Environment: All
            Reporter: Milind Bhandarkar
         Assigned To: Milind Bhandarkar
             Fix For: 0.9.0


Currently the chooseTargets method (that selects datanodes for block-placement) 
takes in excess of 20% of cpu on a namenode. This is the most time-consuming 
namenode method, according to the profiler. This inefficiency has already 
contributed to cascading crash in DFS earlier. As datanodes went down, new 
locations needed to be found for the blocks on dead datanodes, and since this 
was done inside a synchronized method, it locked the whole namesystem for 
several minutes, which caused more datanode failures, when the namenode marked 
them dead because no heartbeat could be processed during that interval. This 
has been detailed in HADOOP-572.

The patch I am about to upload reduces the time taken in the chooseTarget 
method to be proportional to nReplicas per block, instead of the current 
implementation, which is proportional to (nDataNodes * nReplicas). Also, when a 
number of datanodes crash, their blocks are put on the pendingReplications list 
one datanode at a time in a synchronized section. (Currently, the syncchronized 
section processes ALL the dead datanodes, thus locking the namesystem for a 
considerable amount of time.) Also, this patch will add a unit test to check 
replication.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to