[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.006.patch

Update patch to remove unnecessary import.

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-29 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch, 
> HDFS-8859.006.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-24 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.005.patch

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.004.patch

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-13 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Summary: Improve DataNode ReplicaMap memory footprint to save about 45%  
(was: Improve DataNode (ReplicaMap) memory footprint to save about 45%)

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-13 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8859:
--
Priority: Major  (was: Critical)

This is a good change although it does not reduce the overall datanode memory 
footprint much.  (For 10m blocks, it only reduces 400MB memory.  However, a 
datanode does not even have 1m blocks in practice.)

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-11 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.003.patch

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-07 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Status: Patch Available  (was: In Progress)

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-07 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.001.patch

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-07 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Attachment: HDFS-8859.002.patch

Fix the test failures and enhance the test.

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-06 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Target Version/s: 2.8.0

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical

 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8859) Improve DataNode (ReplicaMap) memory footprint to save about 45%

2015-08-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8859:
-
Priority: Critical  (was: Major)

 Improve DataNode (ReplicaMap) memory footprint to save about 45%
 

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Critical

 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)