[ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132364#comment-13132364
 ] 

jirapos...@reviews.apache.org commented on HDFS-2476:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2515/#review2738
-----------------------------------------------------------

Ship it!


Looks very good. See the small nits below. Are there any non-trivial changes in 
this patch compared to what you're running in production?


trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java
<https://reviews.apache.org/r/2515/#comment6169>

    This javadoc describes a linked hashset, but the implementation is just a 
normal hashset. Copy-paste error?



trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java
<https://reviews.apache.org/r/2515/#comment6167>

    can hashCode and element be marked final?



trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java
<https://reviews.apache.org/r/2515/#comment6168>

    formatting is a little off - should go on same line



trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java
<https://reviews.apache.org/r/2515/#comment6170>

    I find this javadoc a little misleading - it implies that there's a linked 
list of elements separate from the hashtable chains. The removal order is in 
hash-bucket order rather than insertion order. "Maybe something like: removes N 
entries from the hashtable. The order in which entries are removed is 
unspecified, and may not correspond to the order in which they were inserted."


- Todd


On 2011-10-20 22:58:45, Tomasz Nykiel wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2515/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-20 22:58:45)
bq.  
bq.  
bq.  Review request for Hairong Kuang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch introduces two hash data structures for storing 
under-replicated, over-replicated and invalidated blocks.
bq.  
bq.  1. LightWeightHashSet
bq.  2. LightWeightLinkedSet
bq.  
bq.  Currently in all these cases we are using java.util.TreeSet which adds 
unnecessary overhead.
bq.  
bq.  The main bottlenecks addressed by this patch are:
bq.  -cluster instability times, when these queues (especially 
under-replicated) tend to grow quite drastically,
bq.  -initial cluster startup, when the queues are initialized, after leaving 
safemode,
bq.  -block reports,
bq.  -explicit acks for block addition and deletion
bq.  
bq.  1. The introduced structures are CPU-optimized.
bq.  2. They shrink and expand according to current capacity.
bq.  3. Add/contains/delete ops are performed in O(1) time (unlike current log 
n for TreeSet).
bq.  4. The sets are equipped with fast access methods for polling a number of 
elements (get+remove), which are used for handling the queues.
bq.  
bq.  
bq.  This addresses bug HDFS-2476.
bq.      https://issues.apache.org/jira/browse/HDFS-2476
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java
 PRE-CREATION 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightLinkedSet.java
 PRE-CREATION 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1187124 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightHashSet.java
 PRE-CREATION 
bq.    
trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightLinkedSet.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2515/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Provided JUnit tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Tomasz
bq.  
bq.


                
> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-2476
>                 URL: https://issues.apache.org/jira/browse/HDFS-2476
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: Tomasz Nykiel
>            Assignee: Tomasz Nykiel
>         Attachments: hashStructures.patch, hashStructures.patch-2
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to