[ https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132364#comment-13132364 ]
jirapos...@reviews.apache.org commented on HDFS-2476: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2515/#review2738 ----------------------------------------------------------- Ship it! Looks very good. See the small nits below. Are there any non-trivial changes in this patch compared to what you're running in production? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java <https://reviews.apache.org/r/2515/#comment6169> This javadoc describes a linked hashset, but the implementation is just a normal hashset. Copy-paste error? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java <https://reviews.apache.org/r/2515/#comment6167> can hashCode and element be marked final? trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java <https://reviews.apache.org/r/2515/#comment6168> formatting is a little off - should go on same line trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java <https://reviews.apache.org/r/2515/#comment6170> I find this javadoc a little misleading - it implies that there's a linked list of elements separate from the hashtable chains. The removal order is in hash-bucket order rather than insertion order. "Maybe something like: removes N entries from the hashtable. The order in which entries are removed is unspecified, and may not correspond to the order in which they were inserted." - Todd On 2011-10-20 22:58:45, Tomasz Nykiel wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2515/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-10-20 22:58:45) bq. bq. bq. Review request for Hairong Kuang. bq. bq. bq. Summary bq. ------- bq. bq. This patch introduces two hash data structures for storing under-replicated, over-replicated and invalidated blocks. bq. bq. 1. LightWeightHashSet bq. 2. LightWeightLinkedSet bq. bq. Currently in all these cases we are using java.util.TreeSet which adds unnecessary overhead. bq. bq. The main bottlenecks addressed by this patch are: bq. -cluster instability times, when these queues (especially under-replicated) tend to grow quite drastically, bq. -initial cluster startup, when the queues are initialized, after leaving safemode, bq. -block reports, bq. -explicit acks for block addition and deletion bq. bq. 1. The introduced structures are CPU-optimized. bq. 2. They shrink and expand according to current capacity. bq. 3. Add/contains/delete ops are performed in O(1) time (unlike current log n for TreeSet). bq. 4. The sets are equipped with fast access methods for polling a number of elements (get+remove), which are used for handling the queues. bq. bq. bq. This addresses bug HDFS-2476. bq. https://issues.apache.org/jira/browse/HDFS-2476 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/UnderReplicatedBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightLinkedSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1187124 bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightHashSet.java PRE-CREATION bq. trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightLinkedSet.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2515/diff bq. bq. bq. Testing bq. ------- bq. bq. Provided JUnit tests. bq. bq. bq. Thanks, bq. bq. Tomasz bq. bq. > More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks > ---------------------------------------------------------------------------------------- > > Key: HDFS-2476 > URL: https://issues.apache.org/jira/browse/HDFS-2476 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Reporter: Tomasz Nykiel > Assignee: Tomasz Nykiel > Attachments: hashStructures.patch, hashStructures.patch-2 > > > This patch introduces two hash data structures for storing under-replicated, > over-replicated and invalidated blocks. > 1. LightWeightHashSet > 2. LightWeightLinkedSet > Currently in all these cases we are using java.util.TreeSet which adds > unnecessary overhead. > The main bottlenecks addressed by this patch are: > -cluster instability times, when these queues (especially under-replicated) > tend to grow quite drastically, > -initial cluster startup, when the queues are initialized, after leaving > safemode, > -block reports, > -explicit acks for block addition and deletion > 1. The introduced structures are CPU-optimized. > 2. They shrink and expand according to current capacity. > 3. Add/contains/delete ops are performed in O(1) time (unlike current log n > for TreeSet). > 4. The sets are equipped with fast access methods for polling a number of > elements (get+remove), which are used for handling the queues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira