[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343530#comment-14343530 ]
Todd Lipcon commented on HDFS-6658: ----------------------------------- Hey Daryn... jumping in unsolicited with a random thought: Currently our data structures are organized such that we can efficiently iterate over all blocks corresponding to a particular Storage (via the linked list encoded in triplets). The new design changes the layout of this structure, but still provides the same O(blocks in storage) iteration. I'm wondering if we could relax this requirement somewhat, and if so, whether we could get some significant gains. For example, if each block just listed its set of replicas as storageIDs (dictionary-coded, so probably 16 or 24 bits is fine), but we didn't have the "back-references" from storages back to blocks. The downside of course is that it would be inefficient to iterate over all of the blocks in a storage - we'd have to iterate over the whole block map. But, I'm wondering if that could actually be beneficial in some ways: - when processing block reports, we could actually process multiple block reports "in parallel". If multiple reports arrive within some short window (like at startup) we could share a single iteration to process both. - processing dead datanodes is already an asynchronous process, so it's probably OK if it takes a bit longer Did you guys consider something like this? > Namenode memory optimization - Block replicas list > --------------------------------------------------- > > Key: HDFS-6658 > URL: https://issues.apache.org/jira/browse/HDFS-6658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.4.1 > Reporter: Amir Langer > Assignee: Daryn Sharp > Attachments: BlockListOptimizationComparison.xlsx, BlocksMap > redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas > list.docx > > > Part of the memory consumed by every BlockInfo object in the Namenode is a > linked list of block references for every DatanodeStorageInfo (called > "triplets"). > We propose to change the way we store the list in memory. > Using primitive integer indexes instead of object references will reduce the > memory needed for every block replica (when compressed oops is disabled) and > in our new design the list overhead will be per DatanodeStorageInfo and not > per block replica. > see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)