[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528117#comment-16528117
 ] 

Sahil Takiar commented on HIVE-19937:
-------------------------------------

[~mi...@cloudera.com] thanks for the input! I didn't notice 
{{CopyOnFirstWriteProperties}} so that helps a lot.

For {{CopyOnFirstWriteProperties}} it looks like all the properties will get 
copied into the "super" {{Properties}} object when there is a write, do you 
think it would be possible to just copy the mutated properties to the super 
class, rather than all of them? My concern is that its probably a common case 
where this properties object gets mutated, in which case copying the entire 
thing would probably defeat the purpose of interning.

I also noticed that in {{PartitionDesc#internProperties}} only the value is 
being interned, but not the key, was that intentional? Code is below:

{code:java}
private static void internProperties(Properties properties) {
    for (Enumeration<?> keys =  properties.propertyNames(); 
keys.hasMoreElements();) {
      String key = (String) keys.nextElement();
      String oldValue = properties.getProperty(key);
      if (oldValue != null) {
        properties.setProperty(key, oldValue.intern());
      }
    }
  }
{code}

I'm working on creating a test that can easily measure the impact of this 
change.

> Intern JobConf objects in Spark tasks
> -------------------------------------
>
>                 Key: HIVE-19937
>                 URL: https://issues.apache.org/jira/browse/HIVE-19937
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-19937.1.patch
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to