[ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
---------------------------------

    Attachment: dn-memory-improvements.patch

Hey Suresh, thanks a lot for filing this issue. A little while back I threw 
together a few changes to see how much memory overhead improvement we could get 
in the DN with minimal effort. Here's a little patch (not necessarily ready for 
commit) which shows the changes I made. This patch does three things:

# Reduce the number of repeated String/char[] objects by storing a single 
reference to a base path and then per replica it stores an int[] containing 
integers denoting the subdirs from base dir to replica file, e.g. "1, 34, 2".
# Switch to using the LighWeightGSet instead of standard java.util structures 
where possible in the DN. We already did this in the NN, but with a little 
adaptation we can do it for some of the DN's data structures as well.
# Intern File objects where possible. Even though interning repeated 
Strings/char[] underlying file objects is a step in the right direction, we can 
do a little bit better by doing our own interning of File objects to further 
reduce overhead from repeated objects.

Using this patch I was able to see per-replica heap usage go from ~650 bytes 
per replica in my test setup to ~250 bytes per replica.

Feel free to take this patch and run with it, use it for ideas, or ignore it 
entirely.
                
> Optimize datanode ReplicasMap and ReplicaInfo
> ---------------------------------------------
>
>                 Key: HDFS-4465
>                 URL: https://issues.apache.org/jira/browse/HDFS-4465
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: dn-memory-improvements.patch
>
>
> In Hadoop a lot of optimization has been done in namenode data structures to 
> be memory efficient. Similar optimizations are necessary for Datanode 
> process. With the growth in storage per datanode and number of blocks hosted 
> on datanode, this jira intends to optimize long lived ReplicasMap and 
> ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to