[ 
https://issues.apache.org/jira/browse/HIVE-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906340#comment-13906340
 ] 

Sergey Shelukhin commented on HIVE-6418:
----------------------------------------

>From applying this patch on my test, I get the following savings... HashMap 
>with 1M entries, key is double, row is double and string (strings are short); 
>one row per container (savings with multiple rows per key will be lower).
Lazy part is disabled.


Before (sorted by shallow size):
|Class|Objects|Shallow Size|Retained Size|
|java.lang.Object[]|3000000|80000000|232384000|
|org.apache.hadoop.hive.serde2.io.DoubleWritable|3000000|72000000|72000000|
|byte[]|1000000|32384000|32384000|
|java.util.HashMap$Entry|1000000|32000000|328384000|
|java.util.ArrayList|1000000|24000000|208384000|
|org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer|1000000|24000000|232384000|
|org.apache.hadoop.hive.ql.exec.persistence.MapJoinEagerRowContainer$NoCopyingArrayList|1000000|24000000|160384000|
|org.apache.hadoop.io.Text|1000000|24000000|56384000|
|org.apache.hadoop.hive.ql.exec.persistence.MapJoinKey|1000000|16000000|64000000|
|java.util.HashMap$Entry[]|1|8388624|336772624|
|java.util.HashMap|1|48|336772672|


After:
|Class|Objects|Shallow Size|Retained Size|
|org.apache.hadoop.hive.serde2.io.DoubleWritable|3000000|72000000|72000000|
|java.lang.Object[]|2000000|56000000|184384000|
|byte[]|1000000|32384000|32384000|
|java.util.HashMap$Entry|1000000|32000000|264384000|
|org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer|1000000|32000000|168384000|
|org.apache.hadoop.io.Text|1000000|24000000|56384000|
|org.apache.hadoop.hive.ql.exec.persistence.MapJoinKey|1000000|16000000|64000000|
|java.util.HashMap$Entry[]|1|8388624|272772624|
|java.util.HashMap|1|48|272772672|


Savings of 19~%



> MapJoinRowContainer has large memory overhead in typical cases
> --------------------------------------------------------------
>
>                 Key: HIVE-6418
>                 URL: https://issues.apache.org/jira/browse/HIVE-6418
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-6418.01.patch, HIVE-6418.02.patch, 
> HIVE-6418.03.patch, HIVE-6418.WIP.patch, HIVE-6418.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to