[ 
https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558917#comment-16558917
 ] 

Gopal V commented on HIVE-20153:
--------------------------------

LGTM - +1 tests pending.

This extra field is still taking up meaningful amounts of memory for the 
objects in the heap. 

>From JOL.

{code}
***** 64-bit VM: **********************************************************
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumEvaluator$SumAgg
 object internals:
 OFFSET  SIZE                TYPE DESCRIPTION                               
VALUE
      0    16                     (object header)                           N/A
     16     1             boolean SumAgg.empty                              N/A
     17     7                     (alignment/padding gap)                  
     24     8    java.lang.Object SumAgg.sum                                N/A
     32     8   java.util.HashSet SumAgg.uniqueObjects                      N/A
Instance size: 40 bytes
Space losses: 7 bytes internal + 0 bytes external = 7 bytes total
...
***** 64-bit VM, compressed references enabled: ***************************
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumEvaluator$SumAgg
 object internals:
 OFFSET  SIZE                TYPE DESCRIPTION                               
VALUE
      0    12                     (object header)                           N/A
     12     1             boolean SumAgg.empty                              N/A
     13     3                     (alignment/padding gap)                  
     16     4    java.lang.Object SumAgg.sum                                N/A
     20     4   java.util.HashSet SumAgg.uniqueObjects                      N/A
Instance size: 24 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

a PTF specific sub-class would remove that part & let me think of a way of 
having a SumAggEmpty class (the "which class is it" goes into the 12 byte obj 
header).

> Count and Sum UDF consume more memory in Hive 2+
> ------------------------------------------------
>
>                 Key: HIVE-20153
>                 URL: https://issues.apache.org/jira/browse/HIVE-20153
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 2.3.2
>            Reporter: Szehon Ho
>            Assignee: Aihua Xu
>            Priority: Major
>         Attachments: HIVE-20153.1.patch, Screen Shot 2018-07-12 at 6.41.28 
> PM.png
>
>
> While playing with Hive2, we noticed that queries with a lot of count() and 
> sum() aggregations run out of memory on Hadoop side where they worked before 
> in Hive1. 
> In many queries, we have to double the Mapper Memory settings (in our 
> particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it 
> makes it not so easy to upgrade to Hive 2.
> Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
> in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
> functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to