[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934768#comment-14934768 ] Uma Maheswara Rao G commented on HDFS-8859: --- Thanks Yi, +1 on the latest patch > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934836#comment-14934836 ] Yi Liu commented on HDFS-8859: -- Thanks Uma, since the new patch only removes the unused import, based on the above Jenkins result, I will commit it shortly. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934697#comment-14934697 ] Yi Liu commented on HDFS-8859: -- Thanks Uma. There is an unused import, I will remove it in the new version of patch. {quote} ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:69:29: Variable 'entries' must be private and have accessor methods. ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:71:17: Variable 'hash_mask' must be private and have accessor methods. ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:73:17: Variable 'size' must be private and have accessor methods. ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:77:17: Variable 'modification' must be private and have accessor methods. {quote} Making the variables of super class 'protected' and modify them in sub classes is a natural behavior, I don't know why checkstype reports we should use private and access through methods. We always access the protected variables in the super class directly in other hadoop code. So I will leave these checkstyle items. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934958#comment-14934958 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1196 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1196/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934986#comment-14934986 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2401 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2401/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935046#comment-14935046 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #433 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/433/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934861#comment-14934861 ] Hudson commented on HDFS-8859: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8538 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8538/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935071#comment-14935071 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2373 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2373/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934863#comment-14934863 ] Yi Liu commented on HDFS-8859: -- Committed to trunk and branch-2, thanks [~szetszwo], [~umamaheswararao], [~brahmareddy] for the reviews and comments! > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934933#comment-14934933 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #465 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/465/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935008#comment-14935008 ] Hudson commented on HDFS-8859: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #458 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/458/]) HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. (yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, > HDFS-8859.006.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933946#comment-14933946 ] Uma Maheswara Rao G commented on HDFS-8859: --- Yi, Checkstyle comments are related. Can you please check them? > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907976#comment-14907976 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 50s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 15s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 53s | The applied patch generated 5 new checkstyle issues (total was 12, now 13). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 59s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 162m 40s | Tests failed in hadoop-hdfs. | | | | 233m 0s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762306/HDFS-8859.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 83e65c5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12676/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12676/console | This message was automatically generated. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907905#comment-14907905 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 49s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 50s | The applied patch generated 5 new checkstyle issues (total was 12, now 13). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 51s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 77m 23s | Tests failed in hadoop-hdfs. | | | | 147m 20s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | | | org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762306/HDFS-8859.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 83e65c5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12675/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12675/console | This message was automatically generated. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo >
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908008#comment-14908008 ] Yi Liu commented on HDFS-8859: -- The one test failure is not related. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906077#comment-14906077 ] Uma Maheswara Rao G commented on HDFS-8859: --- Hi Yi, Thanks for the Nice work. I have put some time and reviewed the patch. Patch almost looks good. Please fix the following test nit. {code} for (int i = 0; i < length; i++) { + while (keys.contains(k = random.nextLong())); + elements[i] = new TestElement(k, random.nextLong()); +} {code} You may want to add keys when you find new random. Otherwise no point of having while here. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907431#comment-14907431 ] Yi Liu commented on HDFS-8859: -- Update the patch: 1. address Uma and Brahma 's comments. 2. cleanup the whitespace and some checkstyle. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907426#comment-14907426 ] Yi Liu commented on HDFS-8859: -- Yes, Thanks Brahma for the comment, the default value of {{trackModification}} is true. I am uploading the patch to address it. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907309#comment-14907309 ] Yi Liu commented on HDFS-8859: -- Thanks Uma for the review, let me update the patch to address your comment. > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907423#comment-14907423 ] Brahma Reddy Battula commented on HDFS-8859: Hi [~hitliuyi], Thank you for working on this. Nice work here.. I have another nit : {{LightWeightResizableGset}} need not override iterator as super class implementation is sufficient > Improve DataNode ReplicaMap memory footprint to save about 45% > -- > > Key: HDFS-8859 > URL: https://issues.apache.org/jira/browse/HDFS-8859 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, > HDFS-8859.003.patch, HDFS-8859.004.patch > > > By using following approach we can save about *45%* memory footprint for each > block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in > DataNode), the details are: > In ReplicaMap, > {code} > private final Map> map = > new HashMap >(); > {code} > Currently we use a HashMap {{Map }} to store the replicas > in memory. The key is block id of the block replica which is already > included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry > has a object overhead. We can implement a lightweight Set which is similar > to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix > size for the entries array, usually it's a big value, an example is > {{BlocksMap}}, this can avoid full gc since no need to resize), also we > should be able to get Element through key. > Following is comparison of memory footprint If we implement a lightweight set > as described: > We can save: > {noformat} > SIZE (bytes) ITEM > 20The Key: Long (12 bytes object overhead + 8 > bytes long) > 12HashMap Entry object overhead > 4 reference to the key in Entry > 4 reference to the value in Entry > 4 hash in Entry > {noformat} > Total: -44 bytes > We need to add: > {noformat} > SIZE (bytes) ITEM > 4 a reference to next element in ReplicaInfo > {noformat} > Total: +4 bytes > So totally we can save 40bytes for each block replica > And currently one finalized replica needs around 46 bytes (notice: we ignore > memory alignment here). > We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica > in DataNode. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702289#comment-14702289 ] Yi Liu commented on HDFS-8859: -- Hi [~szetszwo], do you have time to help review latest patch? Does it look good to you? Thanks. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696639#comment-14696639 ] Yi Liu commented on HDFS-8859: -- The two test failures are not related. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696635#comment-14696635 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 45s | The applied patch generated 6 new checkstyle issues (total was 12, now 16). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 22s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 173m 14s | Tests failed in hadoop-hdfs. | | | | 240m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.net.TestNetUtils | | | hadoop.ha.TestZKFailoverController | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0a03054 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11992/console | This message was automatically generated. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694879#comment-14694879 ] Yi Liu commented on HDFS-8859: -- Thanks [~szetszwo] for the review! Update the patch to address your comments. {quote} How about calling it LightWeightResizableGSet? {quote} Agree, rename it in the new patch. {quote} From your calculation, the patch improve each block replica object size about 45%. The JIRA summary is misleading. It seems claiming that it improves the overall DataNode memory footprint by about 45%. For 10m replicas, the original overall map entry object size is ~900 MB and the new size is ~500MB. Is it correct? {quote} It's correct. Actually I added {{ReplicaMap}} in the JIRA summary, yes, I use {{()}}, :), considering the {{ReplicaMap}} is the major in memory long-lived object of Datanode, of course, there are other aspects (most are transient: data read/write buffer, rpc buffer, etc..), I just highlighted the improvement. {quote} Subclass can call super.put(..) {quote} Update in the new patch. I just used to a new internal method . {quote} There is a rewrite for LightWeightGSet.remove(..) {quote} I revert it in the new patch and keep original one. Original implement has duplicate logic, we can share same logic for all the {{if...else..}} branches. {quote} I think we need some long running tests to make sure the correctness. See TestGSet.runMultipleTestGSet() {quote} Agree, updated it in the new patch. For the test failures of {{003}}, it's because there is one place (BlockPoolSlice) add replicaInfo to replicaMap from a tmp replicapMap, but the replicaInfo is still in the tmp one, we can remove it from the tmp one before adding (for LightWeightGSet, an element is not allowed to exist in two gset). In {{002}} patch, the failure doesn't exist, we have a new implement of {{SetIterator}} which is very similar to the logic in java Hashmap, and a bit different with original one, but both are correct, the major difference is the time of finding next element. In the new patch, I keep the original one, and make few change in BlockPoolSlice. All tests run successfully in my local for the new patch. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695178#comment-14695178 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 21s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 50s | The applied patch generated 6 new checkstyle issues (total was 12, now 16). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 33s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 76m 49s | Tests failed in hadoop-hdfs. | | | | 145m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | | hadoop.hdfs.TestReplication | | | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.TestDatanodeRegistration | | | hadoop.hdfs.tools.TestDebugAdmin | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestDatanodeReport | | | hadoop.hdfs.TestDFSShellGenericOptions | | | hadoop.hdfs.TestParallelRead | | | hadoop.hdfs.tools.TestStoragePolicyCommands | | | hadoop.hdfs.TestDFSRemove | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.web.TestWebHdfsTokens | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.TestQuota | | | hadoop.hdfs.TestDFSClientFailover | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForAcl | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestFileLengthOnClusterRestart | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary | | | hadoop.hdfs.TestFSOutputSummer | | | hadoop.hdfs.TestEncryptionZonesWithHA | | | hadoop.hdfs.TestBlockReaderFactory | | | hadoop.hdfs.TestDFSFinalize | | | hadoop.hdfs.TestDisableConnCache | | | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr | | | hadoop.hdfs.web.TestHttpsFileSystem | | | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.TestHDFSTrash | | | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.hdfs.TestBlockMissingException | | | hadoop.hdfs.TestPipelines | | | hadoop.hdfs.TestRenameWhileOpen | | | hadoop.hdfs.TestFileCreationClient | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.TestFileAppend3 | | | hadoop.hdfs.TestBalancerBandwidth | | | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.TestSeekBug | | | hadoop.hdfs.TestParallelShortCircuitReadUnCached | | | hadoop.hdfs.TestBlockReaderLocal | | | hadoop.hdfs.TestListFilesInFileContext | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestFileStatus | | | hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | Timed out tests | org.apache.hadoop.hdfs.TestFileCreation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 53bef9c | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/whitespace.txt | | hadoop-common test log |
[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696262#comment-14696262 ] Yi Liu commented on HDFS-8859: -- Seems Jenkins has some problem and all are timeout, I randomly select 10 of them, they run successfully quickly, let me re-trigger the Jenkins. Improve DataNode ReplicaMap memory footprint to save about 45% -- Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch, HDFS-8859.004.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693629#comment-14693629 ] Yi Liu commented on HDFS-8859: -- {{TestRestartDFS}} is related to {{003}} patch, but {{002}} doesn't cause any issue. I just debug it, the reason seems the original implementation of {{SetIterator}} in {{LightWeightGSet}} has some issue, I wrote a more clear {{SetIterator}} in the new class {{LightWeightHashGSet}} in {{002}}, but in {{003}}, I make it to extend {{LightWeightGSet}} but not use my new implementation of {{SetIterator}}. If I use my new implementation of {{SetIterator}}, then the failure disappears. Let me find some time later to see why original implementation of {{SetIterator}} in {{LightWeightGSet}} causes the failure (it was not used in original code, so the bug might not be found). Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693857#comment-14693857 ] Tsz Wo Nicholas Sze commented on HDFS-8859: --- The idea sound good. Some comments: - Both LightWeightGSet and the new LightWeightHashGSet use hash functions. So LightWeightHashGSet seems not a good name. How about calling it LightWeightResizableGSet? - From your calculation, the patch improve each block replica object size about 45%. The JIRA summary is misleading. It seems claiming that it improves the overall DataNode memory footprint by about 45%. For 10m replicas, the original overall map entry object size is ~900 MB and the new size is ~500MB. Is it correct? - Why adding LightWeightGSet.putElement? Subclass can call super.put(..). - There is a rewrite for LightWeightGSet.remove(..). Why? The old code is well tested. Please do not change it if possible. - Took a quick looks at the tests. I think we need some long running tests to make sure the correctness. See TestGSet.runMultipleTestGSet(). Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693340#comment-14693340 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 2s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 29s | The applied patch generated 6 new checkstyle issues (total was 12, now 14). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 16s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 220m 50s | Tests failed in hadoop-hdfs. | | | | 286m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | | hadoop.hdfs.server.namenode.ha.TestDNFencing | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite | | | org.apache.hadoop.hdfs.server.namenode.TestFsck | | | org.apache.hadoop.hdfs.TestRestartDFS | | | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750016/HDFS-8859.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1ea1a83 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11976/console | This message was automatically generated. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, HDFS-8859.003.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682093#comment-14682093 ] Tsz Wo Nicholas Sze commented on HDFS-8859: --- - Is the only difference between LightWeightHashGSet and LightWeightGSet that LightWeightHashGSet is resizable? - It seems that some code in LightWeightHashGSet is copied from LightWeightGSet. Could you change LightWeightHashGSet to extends LightWeightGSet? Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692809#comment-14692809 ] Yi Liu commented on HDFS-8859: -- Thanks [~szetszwo] for the review. For your first question, yes, and another small difference is in {{LightWeightHashGSet}} needs to implement {{public CollectionE values()}} as java HashMap, now I add it as an interface of {{GSet}} For your second comment, you are right, it's more better to change LightWeightHashGSet extends LightWeightGSet, I do it in the new patch. Actually when I made the first patch, I ever considered make LightWeightHashGSet to extend LightWeightGSet, at that time I thought to support shrink later and more logic may be different, and make them independent. But I agree we should extend even so. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679157#comment-14679157 ] Yi Liu commented on HDFS-8859: -- The two test failures are not related. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662857#comment-14662857 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 51s | The applied patch generated 11 new checkstyle issues (total was 0, now 11). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 8s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 0m 21s | Tests failed in hadoop-hdfs. | | | | 69m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | Failed build | hadoop-hdfs | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749402/HDFS-8859.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8f73bdd | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11942/console | This message was automatically generated. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662995#comment-14662995 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 39s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 10m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 45s | The applied patch generated 11 new checkstyle issues (total was 0, now 11). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 23m 2s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 110m 52s | Tests failed in hadoop-hdfs. | | | | 188m 36s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestSaveNamespace | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.snapshot.TestSetQuotaWithSnapshot | | | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | | | hadoop.hdfs.server.namenode.TestAddBlock | | | hadoop.hdfs.server.namenode.TestMalformedURLs | | | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots | | | hadoop.hdfs.server.namenode.TestSnapshotPathINodes | | | hadoop.hdfs.server.namenode.TestCreateEditsLog | | | hadoop.hdfs.server.namenode.TestCheckpoint | | | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes | | | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.snapshot.TestDisallowModifyROSnapshot | | | hadoop.hdfs.server.namenode.TestFSDirectory | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots | | | hadoop.hdfs.server.namenode.TestParallelImageWrite | | | hadoop.hdfs.server.namenode.TestEditLogRace | | | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotStatsMXBean | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot | | | hadoop.hdfs.server.namenode.TestINodeFile | | | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot | | | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot | | | hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.namenode.TestFileContextAcl | | | hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot | | | hadoop.hdfs.server.namenode.TestFSNamesystemMBean | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing | | | hadoop.hdfs.server.namenode.TestBackupNode | | | hadoop.hdfs.server.namenode.TestFileLimit | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshottableDirListing | | | hadoop.hdfs.server.namenode.TestNameNodeAcl | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotReplication | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.namenode.TestStartup
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663259#comment-14663259 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 46s | The applied patch generated 12 new checkstyle issues (total was 0, now 12). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 17s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 175m 43s | Tests failed in hadoop-hdfs. | | | | 243m 11s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestNetUtils | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749402/HDFS-8859.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8f73bdd | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11947/console | This message was automatically generated. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661499#comment-14661499 ] Yi Liu commented on HDFS-8859: -- {{LightWeightHashGSet}} implemented in patch is a low memory footprint {{GSet}} implementation, which uses an array for storing the elements and linked lists for collision resolution. If the size of elements exceeds the threshold, the internal array will be resized to double length. Default load factor is 0.75f which is the same as java {{HashMap}}. Currently {{LightWeightHashGSet}} doesn't shrink when removing elements and arriving some threshold, I feel it's not necessary for our case. If you do think we'd better to have this, I can do it in a follow-on. As shown in the patch, {{ReplicaInfo}} needs to implement {{LightWeightHashGSet.LinkedElement}} now, and modification in {{ReplicaMap}} is to use this new lightweight set. By using the new light weight set, we can get the benefits (reduce a lot of memory footprint) as described in the JIRA description. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry 4 reference to the value in Entry 4 hash in Entry {noformat} Total: -44 bytes We need to add: {noformat} SIZE (bytes) ITEM 4 a reference to next element in ReplicaInfo {noformat} Total: +4 bytes So totally we can save 40bytes for each block replica And currently one finalized replica needs around 46 bytes (notice: we ignore memory alignment here). We can save 1 - (4 + 46) / (44 + 46) = *45%* memory for each block replica in DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%
[ https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661761#comment-14661761 ] Hadoop QA commented on HDFS-8859: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 46s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 14s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 24s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 22m 31s | Tests failed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 188m 11s | Tests failed in hadoop-hdfs. | | | | 251m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.net.TestNetUtils | | | hadoop.ha.TestZKFailoverController | | | hadoop.hdfs.server.namenode.TestParallelImageWrite | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestDatanodeLayoutUpgrade | | | hadoop.hdfs.server.namenode.ha.TestDNFencing | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749223/HDFS-8859.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b6265d3 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11934/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11934/console | This message was automatically generated. Improve DataNode (ReplicaMap) memory footprint to save about 45% Key: HDFS-8859 URL: https://issues.apache.org/jira/browse/HDFS-8859 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Yi Liu Assignee: Yi Liu Priority: Critical Attachments: HDFS-8859.001.patch By using following approach we can save about *45%* memory footprint for each block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in DataNode), the details are: In ReplicaMap, {code} private final MapString, MapLong, ReplicaInfo map = new HashMapString, MapLong, ReplicaInfo(); {code} Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas in memory. The key is block id of the block replica which is already included in {{ReplicaInfo}}, so this memory can be saved. Also HashMap Entry has a object overhead. We can implement a lightweight Set which is similar to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix size for the entries array, usually it's a big value, an example is {{BlocksMap}}, this can avoid full gc since no need to resize), also we should be able to get Element through key. Following is comparison of memory footprint If we implement a lightweight set as described: We can save: {noformat} SIZE (bytes) ITEM 20The Key: Long (12 bytes object overhead + 8 bytes long) 12HashMap Entry object overhead 4 reference to the key in Entry