[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-07-04 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Suresh, I see all of branch-2.1-beta, branch-2.1.0-alpha, and 
branch-2.1.0-beta. Would you mind merging this commit to whichever of those 
branches you think is appropriate? Thanks a lot.

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.5-alpha
Reporter: Suresh Srinivas
Assignee: Aaron T. Myers
 Fix For: 2.1.0-beta

 Attachments: dn-memory-improvements.patch, HDFS-4465.patch, 
 HDFS-4465.patch, HDFS-4465.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-07-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

Attachment: HDFS-4465.patch

Thanks for the review, Suresh. Here's a patch which removes that unnecessary 
cast.

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.5-alpha
Reporter: Suresh Srinivas
Assignee: Aaron T. Myers
 Attachments: dn-memory-improvements.patch, HDFS-4465.patch, 
 HDFS-4465.patch, HDFS-4465.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-07-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

Attachment: HDFS-4465.patch

Here's an updated patch which should address all of your feedback, Suresh.

# Good thinking. I did some back of the envelope math which suggested that even 
1% was probably higher than necessary for a typical DN. Switched this to 0.5%.
# Per previous discussion, left it extending Block and added a comment.
# Good thinking. Moved the parsing code to a separate static function and added 
a test for it.
# In my testing with a DN with ~1MM blocks, this patch makes each replica go 
from using ~635 bytes per replica to ~250 bytes per replica, so about a 2.5x 
improvement.

Note that to address the findbugs warning I had to add an exception to the 
findbugs exclude file, since in this patch I am very deliberately using the 
String(String) constructor so as to trim the underlying char[] array.

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.5-alpha
Reporter: Suresh Srinivas
Assignee: Aaron T. Myers
 Attachments: dn-memory-improvements.patch, HDFS-4465.patch, 
 HDFS-4465.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-06-19 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

 Target Version/s: 2.1.0-beta
Affects Version/s: 2.0.5-alpha
   Status: Patch Available  (was: Open)

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.5-alpha
Reporter: Suresh Srinivas
Assignee: Aaron T. Myers
 Attachments: dn-memory-improvements.patch, HDFS-4465.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-06-19 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

Attachment: HDFS-4465.patch

Here's a more polished patch which implements the changes I previously 
described. The patch is a bit smaller than the previous one because some of the 
refactoring of LightWeightGSet was already done in other unrelated JIRAs, but 
the approach is still the same.

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: dn-memory-improvements.patch, HDFS-4465.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-02-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

Attachment: dn-memory-improvements.patch

Hey Suresh, thanks a lot for filing this issue. A little while back I threw 
together a few changes to see how much memory overhead improvement we could get 
in the DN with minimal effort. Here's a little patch (not necessarily ready for 
commit) which shows the changes I made. This patch does three things:

# Reduce the number of repeated String/char[] objects by storing a single 
reference to a base path and then per replica it stores an int[] containing 
integers denoting the subdirs from base dir to replica file, e.g. 1, 34, 2.
# Switch to using the LighWeightGSet instead of standard java.util structures 
where possible in the DN. We already did this in the NN, but with a little 
adaptation we can do it for some of the DN's data structures as well.
# Intern File objects where possible. Even though interning repeated 
Strings/char[] underlying file objects is a step in the right direction, we can 
do a little bit better by doing our own interning of File objects to further 
reduce overhead from repeated objects.

Using this patch I was able to see per-replica heap usage go from ~650 bytes 
per replica in my test setup to ~250 bytes per replica.

Feel free to take this patch and run with it, use it for ideas, or ignore it 
entirely.

 Optimize datanode ReplicasMap and ReplicaInfo
 -

 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: dn-memory-improvements.patch


 In Hadoop a lot of optimization has been done in namenode data structures to 
 be memory efficient. Similar optimizations are necessary for Datanode 
 process. With the growth in storage per datanode and number of blocks hosted 
 on datanode, this jira intends to optimize long lived ReplicasMap and 
 ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira