[ 
https://issues.apache.org/jira/browse/HIVE-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400630#comment-16400630
 ] 

Misha Dmitriev commented on HIVE-16879:
---------------------------------------

I agree about the negligible CPU performance impact of String.intern(), 
especially when compared with reduced heap size and GC time. Again, I think 
this is a good change, assuming that it's applied in the right place.

However, my experience is that guessing doesn't always work when you try to 
determine where _exactly_ memory is wasted. Do you have access to some running 
Hive instances where you would expect this to be a problem? Then, at a minimum, 
you can run 'jmap -histo:live' to get the number of Key instances and roughly 
estimate memory used by the strings that Keys reference. And the best thing 
would be to take a heap dump (jmap -dump:live,format=b,...) and analyze it with 
a tool, e.g. [www.jxray.com,|http://www.jxray.com,/] that immediately tells you 
the memory overhead of duplicate strings. You will immediately see whether Keys 
cause noticeable overhead, and/or what other classes cause it.

> Improve Cache Key
> -----------------
>
>                 Key: HIVE-16879
>                 URL: https://issues.apache.org/jira/browse/HIVE-16879
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Trivial
>         Attachments: HIVE-16879.1.patch, HIVE-16879.2.patch
>
>
> Improve cache key for cache implemented in 
> {{org.apache.hadoop.hive.metastore.AggregateStatsCache}}.
> # Cache some of the key components themselves (db name, table name) using 
> {{String}} intern method to conserve memory for repeated keys, to improve 
> {{equals}} method as now references can be used for equality, and hashcodes 
> will be cached as well as per {{String}} clash hashcode method.
> # Upgrade _debug_ logging to not generate text unless required
> # Changed _equals_ method to check first for the item most likely to be 
> different, column name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to