[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Suresh, I see all of branch-2.1-beta, branch-2.1.0-alpha, and branch-2.1.0-beta. Would you mind merging this commit to whichever of those branches you think is appropriate? Thanks a lot. Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.5-alpha Reporter: Suresh Srinivas Assignee: Aaron T. Myers Fix For: 2.1.0-beta Attachments: dn-memory-improvements.patch, HDFS-4465.patch, HDFS-4465.patch, HDFS-4465.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Attachment: HDFS-4465.patch Thanks for the review, Suresh. Here's a patch which removes that unnecessary cast. Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.5-alpha Reporter: Suresh Srinivas Assignee: Aaron T. Myers Attachments: dn-memory-improvements.patch, HDFS-4465.patch, HDFS-4465.patch, HDFS-4465.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Attachment: HDFS-4465.patch Here's an updated patch which should address all of your feedback, Suresh. # Good thinking. I did some back of the envelope math which suggested that even 1% was probably higher than necessary for a typical DN. Switched this to 0.5%. # Per previous discussion, left it extending Block and added a comment. # Good thinking. Moved the parsing code to a separate static function and added a test for it. # In my testing with a DN with ~1MM blocks, this patch makes each replica go from using ~635 bytes per replica to ~250 bytes per replica, so about a 2.5x improvement. Note that to address the findbugs warning I had to add an exception to the findbugs exclude file, since in this patch I am very deliberately using the String(String) constructor so as to trim the underlying char[] array. Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.5-alpha Reporter: Suresh Srinivas Assignee: Aaron T. Myers Attachments: dn-memory-improvements.patch, HDFS-4465.patch, HDFS-4465.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Target Version/s: 2.1.0-beta Affects Version/s: 2.0.5-alpha Status: Patch Available (was: Open) Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.5-alpha Reporter: Suresh Srinivas Assignee: Aaron T. Myers Attachments: dn-memory-improvements.patch, HDFS-4465.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Attachment: HDFS-4465.patch Here's a more polished patch which implements the changes I previously described. The patch is a bit smaller than the previous one because some of the refactoring of LightWeightGSet was already done in other unrelated JIRAs, but the approach is still the same. Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: dn-memory-improvements.patch, HDFS-4465.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Attachment: dn-memory-improvements.patch Hey Suresh, thanks a lot for filing this issue. A little while back I threw together a few changes to see how much memory overhead improvement we could get in the DN with minimal effort. Here's a little patch (not necessarily ready for commit) which shows the changes I made. This patch does three things: # Reduce the number of repeated String/char[] objects by storing a single reference to a base path and then per replica it stores an int[] containing integers denoting the subdirs from base dir to replica file, e.g. 1, 34, 2. # Switch to using the LighWeightGSet instead of standard java.util structures where possible in the DN. We already did this in the NN, but with a little adaptation we can do it for some of the DN's data structures as well. # Intern File objects where possible. Even though interning repeated Strings/char[] underlying file objects is a step in the right direction, we can do a little bit better by doing our own interning of File objects to further reduce overhead from repeated objects. Using this patch I was able to see per-replica heap usage go from ~650 bytes per replica in my test setup to ~250 bytes per replica. Feel free to take this patch and run with it, use it for ideas, or ignore it entirely. Optimize datanode ReplicasMap and ReplicaInfo - Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: dn-memory-improvements.patch In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira