[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350936#comment-14350936 ]
Colin Patrick McCabe commented on HDFS-6658: -------------------------------------------- [~clamb] and I have been discussing how to do block reports without backreferences. If you have a 64-bit epoch number per datanode, you can bump that on each FBR. Then, you can simply ignore block entries that are too old when you are accessing them. In that case, you don't need to remove all stale blocks during an FBR. The downside of this approach is that the memory for the old entries will linger for a while longer than it would have otherwise. But if the memory consumption per entry is lower, it's probably still a win. It's pretty rare for a large number of blocks to go away without being mentioned in incremental block reports (IBRs). In the case where all IBRs are being received normally, of course, you have no additional memory overhead at all since you delete entries as soon as you get the incremental block removal notification. And of course with the epoch-based approach, you avoid updating the three linked list entries each time you touch a block in the FBR. This should give much better cache locality (the linked list has basically no cache locality at all... we're hammering main memory pretty much all the time right now). This would probably be coupled with some kind of background scanner thread that removed stale blockinfo instances from the hash table. > Namenode memory optimization - Block replicas list > --------------------------------------------------- > > Key: HDFS-6658 > URL: https://issues.apache.org/jira/browse/HDFS-6658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.4.1 > Reporter: Amir Langer > Assignee: Daryn Sharp > Attachments: BlockListOptimizationComparison.xlsx, BlocksMap > redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas > list.docx > > > Part of the memory consumed by every BlockInfo object in the Namenode is a > linked list of block references for every DatanodeStorageInfo (called > "triplets"). > We propose to change the way we store the list in memory. > Using primitive integer indexes instead of object references will reduce the > memory needed for every block replica (when compressed oops is disabled) and > in our new design the list overhead will be per DatanodeStorageInfo and not > per block replica. > see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)