[ 
https://issues.apache.org/jira/browse/HIVE-16489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev resolved HIVE-16489.
-----------------------------------
    Resolution: Duplicate

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-16489
>                 URL: https://issues.apache.org/jira/browse/HIVE-16489
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>
> I've just analyzed an HMS heap dump. It turns out that it contains a lot of 
> duplicate strings, that waste 26.4% of the heap. Most of them come from 
> HashMaps referenced by 
> org.apache.hadoop.hive.metastore.api.Partition.parameters. Below is the 
> relevant section of the jxray (www.jxray.com) report. Looking at 
> Partition.java, I see that in the past somebody has already added code to 
> intern keys and values in the parameters table when it's first set up. 
> However, looks like when more key-value pairs are added, they are not 
> interned, and that probably explains the reason for all these duplicate 
> strings.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> Top duplicate strings:
>     Ovhd         Num char[]s   Num objs   Value
>  46,088K (0.4%)     5871        5871      
> "HBa4rRAAGx2MEmludGVyZXN0cmF0ZXNwcmVhZBgM/wD/AP8AXAAAAqEAERYBFQAXAAAAAAAAIEAWuK0QAA1s
>  ...[length 4000]"
>  46,088K (0.4%)     5871        5871      
> "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJBgcGBAcLBQYGCAgGCQYG
>  ...[length 4000]"
> ...
> ===================================================
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>      <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>      <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10", 623 of 
> "CQUJBQcFCAcGBwUFCgUIDAgEBwgFBQcHBwgGBwYEBQoLCggFCAYHBgcIBwkIDgcG ...[length 
> 4000]", 623 of 
> "BQcHBQUGBQgGBQcHCAUGCAkECQcFBQwGBgoJBQYHBQUFBQYKBQgIBgUJEgYFDAYJ ...[length 
> 4000]", 623 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]", 623 of 
> "AAMAAAEAAAAAAAEAAAAAAQABAAEHAwAKAgAEAwAAAAAAAgAEAAAAAAMAAAADAAAA ...[length 
> 4000]"
> ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>      <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to