[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060023#comment-14060023 ]
Amir Langer commented on HDFS-6658: ----------------------------------- Hi [~kihwal] - In response to the scenario of massive block deletes without any block adds that folllow it which leaves a lot of empty array references: Yes - you're right and there is currently nothing in the code that takes care of it. We can introduce a check that removes a whole chunk if it is empty, or, copies some references around in order to use less memory. (Some algorithm similar to defragmentation). However, this will either take a lot of latency (if done as part of a client call), or will require a monitor thread and then will force us to turn everything into being thread-safe which will add latency to all calls again. In short, any solution cost is high. The reason I was reluctant to pay it, is that if you consider the scenario when it happens - once we deleted a lot of blocks - we shouldn't really have a big memory shortage (even if you still have the arrays, you cleared all those block instances which is a lot more). We're actually OK if we don't really need to add blocks (i.e. there isn't much of a benefit). And once we do need to add blocks - then the problem of sparse arrays goes away anyway. In short, yes, it is there - but I believe the cost does not justify the benefits. > Namenode memory optimization - Block replicas list > --------------------------------------------------- > > Key: HDFS-6658 > URL: https://issues.apache.org/jira/browse/HDFS-6658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.4.1 > Reporter: Amir Langer > Assignee: Amir Langer > Attachments: Namenode Memory Optimizations - Block replicas list.docx > > > Part of the memory consumed by every BlockInfo object in the Namenode is a > linked list of block references for every DatanodeStorageInfo (called > "triplets"). > We propose to change the way we store the list in memory. > Using primitive integer indexes instead of object references will reduce the > memory needed for every block replica (when compressed oops is disabled) and > in our new design the list overhead will be per DatanodeStorageInfo and not > per block replica. > see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)