[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061381#comment-14061381 ]
Colin Patrick McCabe commented on HDFS-6658: -------------------------------------------- I think that we need to take a more long-term view here. Clearly, the number of replicas in the cluster is going to double a few times in the next few years. The big problem is that after a certain size, JVM heaps just become unmanageable. Full GC times grow too long. We lose the ability to use optimizations like compressed oopses (which is a speed as well as memory win.) Compressed oopses are not available after the JVM heap grows beyond 32 GB. Optimizations that just give a constant factor reduction in NameNode's JVM heap size, like this one (HDFS-6658), aren't a very long-term solution. Another doubling in the number of block replicas would more than wipe out the gain here. Splitting the block manager out into a separate daemon (HDFS-5477) isn't a long term solution either. Sure, it cuts the NameNode heap in 2 (or whatever fraction of the NN heap is taken up by the BlockManager). But another doubling wipes that out too. Putting the {{BlockManager}} memory off-heap is a more long-term solution to the problem. With the block replicas and the inodes off-heap, we could have a small NameNode heap, and long GCs would never be a problem again. Of course, eventually there may be other bottlenecks in the system, like the size of full block reports. Nobody said off-heap was a silver bullet. But it seems like a useful starting point for many other optimizations. This optimization doesn't seem like a useful starting point to anything. If I understand correctly, its best-case claimed benefit (20%) only applies to the case where users have compressed oopses disabled. If compressed oopses are on, Java references are already only 4 bytes, and performance will actually be worse, right? I think we should consider alternatives before we go down this path. > Namenode memory optimization - Block replicas list > --------------------------------------------------- > > Key: HDFS-6658 > URL: https://issues.apache.org/jira/browse/HDFS-6658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.4.1 > Reporter: Amir Langer > Assignee: Amir Langer > Attachments: Namenode Memory Optimizations - Block replicas list.docx > > > Part of the memory consumed by every BlockInfo object in the Namenode is a > linked list of block references for every DatanodeStorageInfo (called > "triplets"). > We propose to change the way we store the list in memory. > Using primitive integer indexes instead of object references will reduce the > memory needed for every block replica (when compressed oops is disabled) and > in our new design the list overhead will be per DatanodeStorageInfo and not > per block replica. > see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)