subject:"\[jira\] \[Commented\] \(HDFS\-8859\) Improve DataNode ReplicaMap memory footprint to save about 45%"

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934768#comment-14934768
 ] 

Uma Maheswara Rao G commented on HDFS-8859:
---

Thanks Yi,  +1 on the latest patch

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934836#comment-14934836
 ] 

Yi Liu commented on HDFS-8859:
--

Thanks Uma, since the new patch only removes the unused import, based on the 
above Jenkins result, I will commit it shortly.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934697#comment-14934697
 ] 

Yi Liu commented on HDFS-8859:
--

Thanks Uma.  
There is an unused import, I will remove it in the new version of patch.
{quote}
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:69:29:
 Variable 'entries' must be private and have accessor methods.
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:71:17:
 Variable 'hash_mask' must be private and have accessor methods.
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:73:17:
 Variable 'size' must be private and have accessor methods.
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java:77:17:
 Variable 'modification' must be private and have accessor methods.
{quote}
Making the variables of super class 'protected' and modify them in sub classes 
is a natural behavior, I don't know why checkstype reports we should use 
private and access through methods.  We always access the protected variables 
in the super class directly in other hadoop code.  
So I will leave these checkstyle items. 

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934958#comment-14934958
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1196 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1196/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934986#comment-14934986
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2401 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2401/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935046#comment-14935046
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #433 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/433/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934861#comment-14934861
 ] 

Hudson commented on HDFS-8859:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8538 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8538/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935071#comment-14935071
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2373 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2373/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934863#comment-14934863
 ] 

Yi Liu commented on HDFS-8859:
--

Committed to trunk and branch-2, thanks [~szetszwo], [~umamaheswararao], 
[~brahmareddy] for the reviews and comments!

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934933#comment-14934933
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #465 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/465/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935008#comment-14935008
 ] 

Hudson commented on HDFS-8859:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #458 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/458/])
HDFS-8859. Improve DataNode ReplicaMap memory footprint to save about 45%. 
(yliu) (yliu: rev d6fa34e014b0e2a61b24f05dd08ebe12354267fd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightCache.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightResizableGSet.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLightWeightResizableGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LightWeightGSet.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GSetByHashMap.java


> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933946#comment-14933946
 ] 

Uma Maheswara Rao G commented on HDFS-8859:
---

Yi, Checkstyle comments are related. Can you please check them?

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907976#comment-14907976
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 50s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 53s | The applied patch generated  5 
new checkstyle issues (total was 12, now 13). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 59s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 162m 40s | Tests failed in hadoop-hdfs. |
| | | 233m  0s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762306/HDFS-8859.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e65c5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12676/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12676/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12676/console |


This message was automatically generated.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907905#comment-14907905
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 49s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 50s | The applied patch generated  5 
new checkstyle issues (total was 12, now 13). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 51s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  77m 23s | Tests failed in hadoop-hdfs. |
| | | 147m 20s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762306/HDFS-8859.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 83e65c5 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12675/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12675/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12675/console |


This message was automatically generated.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
>

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-25 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908008#comment-14908008
 ] 

Yi Liu commented on HDFS-8859:
--

The one test failure is not related.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906077#comment-14906077
 ] 

Uma Maheswara Rao G commented on HDFS-8859:
---

Hi Yi, Thanks for the Nice work. I have put some time and reviewed the patch. 
Patch almost looks good.
Please fix the following test nit.
{code}
for (int i = 0; i < length; i++) {
+  while (keys.contains(k = random.nextLong()));
+  elements[i] = new TestElement(k, random.nextLong());
+}
{code}
You may want to add keys when you find new random. Otherwise no point of having 
while here.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907431#comment-14907431
 ] 

Yi Liu commented on HDFS-8859:
--

Update the patch:
1. address Uma and Brahma 's comments.
2. cleanup the whitespace and some checkstyle.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907426#comment-14907426
 ] 

Yi Liu commented on HDFS-8859:
--

Yes, Thanks Brahma for the comment, the default value of {{trackModification}} 
is true. I am uploading the patch to address it.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907309#comment-14907309
 ] 

Yi Liu commented on HDFS-8859:
--

Thanks Uma for the review, let me update the patch to address your comment.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907423#comment-14907423
 ] 

Brahma Reddy Battula commented on HDFS-8859:


Hi [~hitliuyi], Thank you for working on this. Nice work here..
   I have another nit :  {{LightWeightResizableGset}} need not override 
iterator as super class implementation is sufficient

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-18 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702289#comment-14702289
 ] 

Yi Liu commented on HDFS-8859:
--

Hi [~szetszwo], do you have time to help review latest patch? Does it look good 
to you? Thanks.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-14 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696639#comment-14696639
 ] 

Yi Liu commented on HDFS-8859:
--

The two test failures are not related.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696635#comment-14696635
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 45s | The applied patch generated  6 
new checkstyle issues (total was 12, now 16). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 22s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 173m 14s | Tests failed in hadoop-hdfs. |
| | | 240m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0a03054 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/console |


This message was automatically generated.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694879#comment-14694879
 ] 

Yi Liu commented on HDFS-8859:
--

Thanks [~szetszwo] for the review! Update the patch to address your comments.
{quote}
How about calling it LightWeightResizableGSet?
{quote}
Agree, rename it in the new patch.

{quote}
From your calculation, the patch improve each block replica object size about 
45%. The JIRA summary is misleading. It seems claiming that it improves the 
overall DataNode memory footprint by about 45%. For 10m replicas, the original 
overall map entry object size is ~900 MB and the new size is ~500MB. Is it 
correct?
{quote}
It's correct. Actually I added {{ReplicaMap}} in the JIRA summary, yes, I use 
{{()}}, :), considering the {{ReplicaMap}} is the major in memory long-lived 
object of Datanode, of course, there are other aspects (most are transient: 
data read/write buffer, rpc buffer, etc..), I just highlighted the improvement.

{quote}
 Subclass can call super.put(..)
{quote}
Update in the new patch. I just used to a new internal method . 

{quote}
There is a rewrite for LightWeightGSet.remove(..)
{quote}
I revert it in the new patch and keep original one. Original implement has 
duplicate logic, we can share same logic for all the {{if...else..}} branches.

{quote}
I think we need some long running tests to make sure the correctness. See 
TestGSet.runMultipleTestGSet()
{quote}
Agree, updated it in the new patch. 


For the test failures of {{003}}, it's because there is one place 
(BlockPoolSlice) add replicaInfo to replicaMap from a tmp replicapMap, but the 
replicaInfo is still in the tmp one, we can remove it from the tmp one before 
adding (for LightWeightGSet, an element is not allowed to exist in two gset).  
In {{002}} patch, the failure doesn't exist, we have a new implement of 
{{SetIterator}} which is very similar to the logic in java Hashmap, and a bit 
different with original one, but both are correct, the major difference is the 
time of finding next element. In the new patch, I keep the original one, and 
make few change in BlockPoolSlice.  All tests run successfully in my local for 
the new patch.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695178#comment-14695178
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 21s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 51s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 50s | The applied patch generated  6 
new checkstyle issues (total was 12, now 16). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 33s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  76m 49s | Tests failed in hadoop-hdfs. |
| | | 145m 35s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
|   | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.TestSafeMode |
|   | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.tools.TestDebugAdmin |
|   | hadoop.hdfs.TestSetrepIncreasing |
|   | hadoop.hdfs.TestDatanodeReport |
|   | hadoop.hdfs.TestDFSShellGenericOptions |
|   | hadoop.hdfs.TestParallelRead |
|   | hadoop.hdfs.tools.TestStoragePolicyCommands |
|   | hadoop.hdfs.TestDFSRemove |
|   | hadoop.hdfs.qjournal.TestSecureNNWithQJM |
|   | hadoop.hdfs.web.TestWebHdfsTokens |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.TestPersistBlocks |
|   | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.TestQuota |
|   | hadoop.hdfs.TestDFSClientFailover |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForAcl |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.TestFileLengthOnClusterRestart |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForContentSummary |
|   | hadoop.hdfs.TestFSOutputSummer |
|   | hadoop.hdfs.TestEncryptionZonesWithHA |
|   | hadoop.hdfs.TestBlockReaderFactory |
|   | hadoop.hdfs.TestDFSFinalize |
|   | hadoop.hdfs.TestDisableConnCache |
|   | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr |
|   | hadoop.hdfs.web.TestHttpsFileSystem |
|   | hadoop.hdfs.web.TestWebHdfsWithAuthenticationFilter |
|   | hadoop.hdfs.web.TestWebHDFSAcl |
|   | hadoop.hdfs.TestHDFSTrash |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.hdfs.TestBlockMissingException |
|   | hadoop.hdfs.TestPipelines |
|   | hadoop.hdfs.TestRenameWhileOpen |
|   | hadoop.hdfs.TestFileCreationClient |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.TestFileAppend3 |
|   | hadoop.hdfs.TestBalancerBandwidth |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.TestSeekBug |
|   | hadoop.hdfs.TestParallelShortCircuitReadUnCached |
|   | hadoop.hdfs.TestBlockReaderLocal |
|   | hadoop.hdfs.TestListFilesInFileContext |
|   | hadoop.hdfs.web.TestWebHDFSXAttr |
|   | hadoop.hdfs.TestFileStatus |
|   | hadoop.hdfs.web.TestFSMainOperationsWebHdfs |
| Timed out tests | org.apache.hadoop.hdfs.TestFileCreation |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 53bef9c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11987/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log |

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-13 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696262#comment-14696262
 ] 

Yi Liu commented on HDFS-8859:
--

Seems Jenkins has some problem and all are timeout, I randomly select 10 of 
them, they run successfully quickly, let me re-trigger the Jenkins.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-12 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693629#comment-14693629
 ] 

Yi Liu commented on HDFS-8859:
--

{{TestRestartDFS}} is related to {{003}} patch,  but {{002}} doesn't cause any 
issue.   I just debug it, the reason seems the original implementation of 
{{SetIterator}} in {{LightWeightGSet}} has some issue,  I wrote a more clear 
{{SetIterator}} in the new class {{LightWeightHashGSet}} in {{002}}, but in 
{{003}}, I make it to extend {{LightWeightGSet}} but not use my new 
implementation of {{SetIterator}}. If I use my new implementation of 
{{SetIterator}}, then the failure disappears. 

Let me find some time later to see why original implementation of 
{{SetIterator}} in {{LightWeightGSet}} causes the failure (it was not used in 
original code, so the bug might not be found).

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693857#comment-14693857
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8859:
---

The idea sound good.  Some comments:
- Both LightWeightGSet and the new LightWeightHashGSet use hash functions.  So 
LightWeightHashGSet seems not a good name.  How about calling it 
LightWeightResizableGSet?
- From your calculation, the patch improve each block replica object size about 
45%.  The JIRA summary is misleading.  It seems claiming that it improves the 
overall DataNode memory footprint by about 45%.  For 10m replicas, the original 
overall map entry object size is ~900 MB and the new size is ~500MB.  Is it 
correct?
- Why adding LightWeightGSet.putElement?  Subclass can call super.put(..).
- There is a rewrite for LightWeightGSet.remove(..).  Why?  The old code is 
well tested.  Please do not change it if possible.
- Took a quick looks at the tests.  I think we need some long running tests to 
make sure the correctness.  See TestGSet.runMultipleTestGSet().

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693340#comment-14693340
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  2s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 29s | The applied patch generated  6 
new checkstyle issues (total was 12, now 14). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 16s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 220m 50s | Tests failed in hadoop-hdfs. |
| | | 286m  7s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite |
|   | org.apache.hadoop.hdfs.server.namenode.TestFsck |
|   | org.apache.hadoop.hdfs.TestRestartDFS |
|   | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750016/HDFS-8859.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1ea1a83 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11976/console |


This message was automatically generated.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682093#comment-14682093
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8859:
---

- Is the only difference between LightWeightHashGSet and LightWeightGSet that 
LightWeightHashGSet is resizable?
- It seems that some code in LightWeightHashGSet is copied from 
LightWeightGSet.  Could you change LightWeightHashGSet to extends 
LightWeightGSet?

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692809#comment-14692809
 ] 

Yi Liu commented on HDFS-8859:
--

Thanks [~szetszwo] for the review.  
For your first question, yes, and another small difference is in 
{{LightWeightHashGSet}} needs to implement {{public CollectionE values()}} as 
java HashMap, now I add it as an interface of  {{GSet}}

For your second comment, you are right, it's more better to change 
LightWeightHashGSet extends LightWeightGSet, I do it in the new patch.   
Actually when I made the first patch, I ever considered make 
LightWeightHashGSet  to extend LightWeightGSet, at that time I thought to 
support shrink later and more logic may be different, and make them 
independent. But I agree we should extend even so.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-09 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679157#comment-14679157
 ] 

Yi Liu commented on HDFS-8859:
--

The two test failures are not related.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662857#comment-14662857
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 51s | The applied patch generated  
11 new checkstyle issues (total was 0, now 11). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m  8s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |   0m 21s | Tests failed in hadoop-hdfs. |
| | |  69m  9s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
| Failed build | hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749402/HDFS-8859.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f73bdd |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11942/console |


This message was automatically generated.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662995#comment-14662995
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 39s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 45s | The applied patch generated  
11 new checkstyle issues (total was 0, now 11). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 46s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  23m  2s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 110m 52s | Tests failed in hadoop-hdfs. |
| | | 188m 36s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.TestSaveNamespace |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.server.namenode.TestAuditLogger |
|   | hadoop.hdfs.server.namenode.snapshot.TestSetQuotaWithSnapshot |
|   | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.namenode.TestAddBlock |
|   | hadoop.hdfs.server.namenode.TestMalformedURLs |
|   | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestSnapshotPathINodes |
|   | hadoop.hdfs.server.namenode.TestCreateEditsLog |
|   | hadoop.hdfs.server.namenode.TestCheckpoint |
|   | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport |
|   | hadoop.hdfs.server.namenode.TestAuditLogs |
|   | hadoop.hdfs.server.namenode.snapshot.TestDisallowModifyROSnapshot |
|   | hadoop.hdfs.server.namenode.TestFSDirectory |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
|   | hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestParallelImageWrite |
|   | hadoop.hdfs.server.namenode.TestEditLogRace |
|   | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotStatsMXBean |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
|   | hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot |
|   | hadoop.hdfs.server.namenode.TestINodeFile |
|   | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot |
|   | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot |
|   | hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
|   | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestFileContextAcl |
|   | 
hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot 
|
|   | hadoop.hdfs.server.namenode.TestFSNamesystemMBean |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing |
|   | hadoop.hdfs.server.namenode.TestBackupNode |
|   | hadoop.hdfs.server.namenode.TestFileLimit |
|   | hadoop.hdfs.server.namenode.TestFsck |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshottableDirListing |
|   | hadoop.hdfs.server.namenode.TestNameNodeAcl |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotReplication |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.server.namenode.TestStartup

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663259#comment-14663259
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 46s | The applied patch generated  
12 new checkstyle issues (total was 0, now 12). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 17s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 175m 43s | Tests failed in hadoop-hdfs. |
| | | 243m 11s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749402/HDFS-8859.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f73bdd |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11947/console |


This message was automatically generated.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-07 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661499#comment-14661499
 ] 

Yi Liu commented on HDFS-8859:
--

{{LightWeightHashGSet}} implemented in patch is a low memory footprint {{GSet}} 
implementation, which uses an array for storing the elements and linked lists 
for collision resolution. If the size of elements exceeds the threshold, the 
internal array will be resized to double length. Default load factor is 0.75f 
which is the same as java {{HashMap}}.

Currently {{LightWeightHashGSet}} doesn't shrink when removing elements and 
arriving some threshold, I feel it's not necessary for our case. If you do 
think we'd better to have this, I can do it in a follow-on.

As shown in the patch, {{ReplicaInfo}} needs to implement 
{{LightWeightHashGSet.LinkedElement}} now, and modification in {{ReplicaMap}} 
is to use this new lightweight set.

By using the new light weight set, we can get the benefits (reduce a lot of 
memory footprint) as described in the JIRA description.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661761#comment-14661761
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 46s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 14s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 31s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 188m 11s | Tests failed in hadoop-hdfs. |
| | | 251m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestParallelImageWrite |
|   | hadoop.hdfs.TestFileAppend2 |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestDatanodeLayoutUpgrade |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12749223/HDFS-8859.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b6265d3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11934/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11934/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11934/console |


This message was automatically generated.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry

37 matches

Mail list logo