[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355932#comment-14355932
 ] 

Daryn Sharp commented on HDFS-6658:
-----------------------------------

Sorry, didn't see your reply, but great minds think alike.  I was thinking of 
something similar to a mark and sweep of the blocks map based on a DN BR serial 
number.  I've long wanted for the DN to send it serial during registration so a 
full BR can be avoided if the NN is already up to date with that serial.  I 
have other ideas for it, but I digress.  No reverse mapping at all makes things 
harder though.

Blocks disappearing w/o an IBR is more common than you think.  At least one 
storage fails per day in our env, which needs a way to quickly reverse map its 
blocks for removal.  Another scenario is all the blocks on a dead node need to 
be removed.   Any added latency in the NN issuing replication requests can be 
dangerous.  If an entire rack fails, then losing any node is virtually 
guaranteed data loss.

I've shifted my thoughts and experiments to a long-to-long hashing of blockId 
to its stored block index I've introduced - in the patch to be posted shortly.  
A fast hash would allow inode and storages alike to only know a blockId.   Move 
the blockId/size/genstamp into my block replicas map as the head entry of the 
chain before the subsequent replicas. Now the intermediate blocks map can 
probably be eliminated for all but UC blocks. 

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Daryn Sharp
>         Attachments: BlockListOptimizationComparison.xlsx, BlocksMap 
> redesign.pdf, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas 
> list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a 
> linked list of block references for every DatanodeStorageInfo (called 
> "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the 
> memory needed for every block replica (when compressed oops is disabled) and 
> in our new design the list overhead will be per DatanodeStorageInfo and not 
> per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to