[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

Colin Patrick McCabe (JIRA) Mon, 14 Jul 2014 15:50:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061381#comment-14061381
 ]


Colin Patrick McCabe commented on HDFS-6658:
--------------------------------------------

I think that we need to take a more long-term view here.  Clearly, the number 
of replicas in the cluster is going to double a few times in the next few 
years.  The big problem is that after a certain size, JVM heaps just become 
unmanageable.  Full GC times grow too long.  We lose the ability to use 
optimizations like compressed oopses (which is a speed as well as memory win.)  
Compressed oopses are not available after the JVM heap grows beyond 32 GB.

Optimizations that just give a constant factor reduction in NameNode's JVM heap 
size, like this one (HDFS-6658), aren't a very long-term solution.  Another 
doubling in the number of block replicas would more than wipe out the gain 
here.  Splitting the block manager out into a separate daemon (HDFS-5477) isn't 
a long term solution either.  Sure, it cuts the NameNode heap in 2 (or whatever 
fraction of the NN heap is taken up by the BlockManager).  But another doubling 
wipes that out too.

Putting the {{BlockManager}} memory off-heap is a more long-term solution to 
the problem.  With the block replicas and the inodes off-heap, we could have a 
small NameNode heap, and long GCs would never be a problem again.  Of course, 
eventually there may be other bottlenecks in the system, like the size of full 
block reports.  Nobody said off-heap was a silver bullet.  But it seems like a 
useful starting point for many other optimizations.

This optimization doesn't seem like a useful starting point to anything.  If I 
understand correctly, its best-case claimed benefit (20%) only applies to the 
case where users have compressed oopses disabled.  If compressed oopses are on, 
Java references are already only 4 bytes, and performance will actually be 
worse, right?  I think we should consider alternatives before we go down this 
path.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

Reply via email to