[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345360#comment-14345360 ]
Daryn Sharp commented on HDFS-6658: ----------------------------------- Great questions, Todd. Both are initial goals that proved too complex and inefficient for a low-risk first round implementation. Regarding not including datanodeId in replicaId: Adding a block requires checking if the block is already in a different storage on the given datanode. Having the datanode as part of the replicaId allows simple comparison of the datanodeId in the replicaId with no lookups. Plus there is the difficult of ensuring no storage uuid collisions, however unlikely, across all nodes ever occur. I really wanted that storages to not maintain block refs or even blockId. However these actions scan a storage's blocks and will suffer immense performance penalties if removed: # the balancer # datanode manager start/stop of decommission # decommission manager scans # removal of failed storages # removal of a dead nodes (ie. all its storages) Balancing and decommissioning already have abysmal effects on performance. We cannot afford for either to be any worse. > Namenode memory optimization - Block replicas list > --------------------------------------------------- > > Key: HDFS-6658 > URL: https://issues.apache.org/jira/browse/HDFS-6658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.4.1 > Reporter: Amir Langer > Assignee: Daryn Sharp > Attachments: BlockListOptimizationComparison.xlsx, BlocksMap > redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas > list.docx > > > Part of the memory consumed by every BlockInfo object in the Namenode is a > linked list of block references for every DatanodeStorageInfo (called > "triplets"). > We propose to change the way we store the list in memory. > Using primitive integer indexes instead of object references will reduce the > memory needed for every block replica (when compressed oops is disabled) and > in our new design the list overhead will be per DatanodeStorageInfo and not > per block replica. > see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)