[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180682#comment-14180682
 ] 

Konstantin Shvachko commented on HDFS-6658:
-------------------------------------------

I agree usually people remove data in order to have space to put more. And the 
freed space usually fills up again in a couple of weeks or months.
I don't know if this asnwer is good enough. It is for me, but in the end you 
got a bigger cluster.
It would be nice to find a way to detect fully empty arrays of the BlockList 
and release them once the last reference is removed. That should be good enough 
to avoid a stand-alone thread for garbage collecting or compacting in your 
terms.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch, 
> Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to