[jira] [Commented] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

jirapos...@reviews.apache.org (Commented) (JIRA) Thu, 20 Oct 2011 15:58:35 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132164#comment-13132164
 ]


jirapos...@reviews.apache.org commented on HDFS-2477:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2516/
-----------------------------------------------------------

Review request for Hairong Kuang.


Summary
-------

When a block report is processed at the NN, the BlockManager.reportDiff 
traverses all blocks contained in the report, and for each one block, which is 
also present in the corresponding datanode descriptor, the block is moved to 
the head of the list of the blocks in this datanode descriptor.

With HDFS-395 the huge majority of the blocks in the report, are also present 
in the datanode descriptor, which means that almost every block in the report 
will have to be moved to the head of the list.

Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
which removes a block from a list and then inserts it. In this process, we call 
findDatanode several times (afair 6 times for each moveBlockToHead call). 
findDatanode is relatively expensive, since it linearly goes through the 
triplets to locate the given datanode.

With this patch, we do some memoization of findDatanode, so we can reclaim 2 
findDatanode calls. Our experiments show that this can improve the reportDiff 
(which is executed under write lock) by around 15%. Currently with HDFS-395, 
reportDiff is responsible for almost 100% of the block report processing time.


This addresses bug HDFS-2477.
    https://issues.apache.org/jira/browse/HDFS-2477


Diffs
-----

  
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
 1187125 
  
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
 1187125 
  
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
 1187125 
  
trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2516/diff


Testing
-------

Additional JUnit tests.


Thanks,

Tomasz


                
> Optimize computing the diff between a block report and the namenode state.
> --------------------------------------------------------------------------
>
>                 Key: HDFS-2477
>                 URL: https://issues.apache.org/jira/browse/HDFS-2477
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: Tomasz Nykiel
>            Assignee: Tomasz Nykiel
>         Attachments: hashStructures.patch-2, reportDiff.patch
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

Reply via email to