[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528876#comment-16528876
 ] 

Misha Dmitriev commented on HIVE-19937:
---------------------------------------

[~stakiar] regarding the behavior of {{CopyOnFirstWriteProperties}} - such 
fine-grain behavior would be easy to implement. It will require changing the 
implementation of this class so that it has pointers to two hashtables: one for 
properties that are specific/unique for the given instance of {{COFWP}} and 
another table with properties that are common/default for all instances of 
{{COFWP}}. Each get() call should first check the first (specific) hashtable 
and then the second (default) hashtable, and each put() call should work only 
with the first hashtable. This would make sense in a situation when there is a 
sufficiently big number of common properties, but every/almost every table also 
has some specific properties. In contrast, the current 
{{CopyOnFirstWriteProperties}} works best when most tables are exactly the same 
and only a few are different. Well, after writing all this I realize that the 
proposed changed implementation of {{COFWP}} would probably be better in all 
scenarios. But before deciding on anything, we definitely should measure where 
the memory goes in realistic scenarios.

Regarding interning only values in {{PartitionDesc#internProperties}} : yes, I 
think this was intentional - I carefully analyzed heap dumps before making this 
change, so if it was worth interning the keys, I would have done that too. Most 
probably when these tables are created, the Strings for keys already come from 
some source where they are already interned.

> Intern JobConf objects in Spark tasks
> -------------------------------------
>
>                 Key: HIVE-19937
>                 URL: https://issues.apache.org/jira/browse/HIVE-19937
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-19937.1.patch
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to