[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

Colin Patrick McCabe (JIRA) Mon, 14 Jul 2014 12:16:38 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061069#comment-14061069
 ]


Colin Patrick McCabe commented on HDFS-6658:
--------------------------------------------

2**31 (2 billion) possible replicas seems like a lot, but if you have a lot of 
replicas being created and deleted, you could burn through it fairly quickly.

So we would need some kind of realistic "garbage collection" strategy to 
reclaim unused numbers.  I don't know what that would be.  A background thread 
sounds reasonable, but then you need to have some way of blocking until that 
thread is done, if you run out of numbers.

In the best case, this only reduces memory consumption by 2x (unless I'm 
misunderstanding something)...  and the cost is a huge amount of complexity.

Rather than doing all this, why not just move the block replica list off-heap?  
Ideally we'd have the replica list and inode map off-heap, and extremely long 
full GC times on NameNodes would become a thing of the past.  To put it another 
way: it's not so much memory consumption that the NameNode is having trouble 
with, but *JVM* memory consumption.

You don't need JNI code to deal with off-heap memory, either.  You can simply 
use {{allocDirect}} and {{Unsafe.getInt}}, {{Unsafe.getLong}}, etc.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

Reply via email to