[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343530#comment-14343530
 ] 

Todd Lipcon commented on HDFS-6658:
-----------------------------------

Hey Daryn... jumping in unsolicited with a random thought:

Currently our data structures are organized such that we can efficiently 
iterate over all blocks corresponding to a particular Storage (via the linked 
list encoded in triplets). The new design changes the layout of this structure, 
but still provides the same O(blocks in storage) iteration.

I'm wondering if we could relax this requirement somewhat, and if so, whether 
we could get some significant gains. For example, if each block just listed its 
set of replicas as storageIDs (dictionary-coded, so probably 16 or 24 bits is 
fine), but we didn't have the "back-references" from storages back to blocks. 
The downside of course is that it would be inefficient to iterate over all of 
the blocks in a storage - we'd have to iterate over the whole block map. But, 
I'm wondering if that could actually be beneficial in some ways:
- when processing block reports, we could actually process multiple block 
reports "in parallel". If multiple reports arrive within some short window 
(like at startup) we could share a single iteration to process both.
- processing dead datanodes is already an asynchronous process, so it's 
probably OK if it takes a bit longer

Did you guys consider something like this?


> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Daryn Sharp
>         Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
> redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas 
> list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to