[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron T. Myers updated HDFS-4465: --------------------------------- Attachment: dn-memory-improvements.patch Hey Suresh, thanks a lot for filing this issue. A little while back I threw together a few changes to see how much memory overhead improvement we could get in the DN with minimal effort. Here's a little patch (not necessarily ready for commit) which shows the changes I made. This patch does three things: # Reduce the number of repeated String/char[] objects by storing a single reference to a base path and then per replica it stores an int[] containing integers denoting the subdirs from base dir to replica file, e.g. "1, 34, 2". # Switch to using the LighWeightGSet instead of standard java.util structures where possible in the DN. We already did this in the NN, but with a little adaptation we can do it for some of the DN's data structures as well. # Intern File objects where possible. Even though interning repeated Strings/char[] underlying file objects is a step in the right direction, we can do a little bit better by doing our own interning of File objects to further reduce overhead from repeated objects. Using this patch I was able to see per-replica heap usage go from ~650 bytes per replica in my test setup to ~250 bytes per replica. Feel free to take this patch and run with it, use it for ideas, or ignore it entirely. > Optimize datanode ReplicasMap and ReplicaInfo > --------------------------------------------- > > Key: HDFS-4465 > URL: https://issues.apache.org/jira/browse/HDFS-4465 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > Attachments: dn-memory-improvements.patch > > > In Hadoop a lot of optimization has been done in namenode data structures to > be memory efficient. Similar optimizations are necessary for Datanode > process. With the growth in storage per datanode and number of blocks hosted > on datanode, this jira intends to optimize long lived ReplicasMap and > ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira