[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350936#comment-14350936
 ] 

Colin Patrick McCabe commented on HDFS-6658:
--------------------------------------------

[~clamb] and I have been discussing how to do block reports without 
backreferences.  If you have a 64-bit epoch number per datanode, you can bump 
that on each FBR.  Then, you can simply ignore block entries that are too old 
when you are accessing them.  In that case, you don't need to remove all stale 
blocks during an FBR.

The downside of this approach is that the memory for the old entries will 
linger for a while longer than it would have otherwise.  But if the memory 
consumption per entry is lower, it's probably still a win.  It's pretty rare 
for a large number of blocks to go away without being mentioned in incremental 
block reports (IBRs).  In the case where all IBRs are being received normally, 
of course, you have no additional memory overhead at all since you delete 
entries as soon as you get the incremental block removal notification.  And of 
course with the epoch-based approach, you avoid updating the three linked list 
entries each time you touch a block in the FBR.  This should give much better 
cache locality (the linked list has basically no cache locality at all... we're 
hammering main memory pretty much all the time right now).

This would probably be coupled with some kind of background scanner thread that 
removed stale blockinfo instances from the hash table.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Daryn Sharp
>         Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
> redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas 
> list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to