[jira] [Commented] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

Hive QA (JIRA) Fri, 04 Jan 2019 12:47:08 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734549#comment-16734549
 ]


Hive QA commented on HIVE-20760:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12953790/HIVE-20760.13.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15723 tests 
executed
*Failed tests:*
{noformat}
TestCompactor - did not produce a TEST-*.xml file (likely timed out) 
(batchId=242)
org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_delimited]
 (batchId=274)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15504/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15504/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15504/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12953790 - PreCommit-HIVE-Build

> Reducing memory overhead due to multiple HiveConfs
> --------------------------------------------------
>
>                 Key: HIVE-20760
>                 URL: https://issues.apache.org/jira/browse/HIVE-20760
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration
>            Reporter: Barnabas Maidics
>            Assignee: Barnabas Maidics
>            Priority: Major
>         Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.13.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

Reply via email to