[ 
https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-20153:
-----------------------------
    Description: 
While playing with Hive2, we noticed that queries with a lot of count() and 
sum() aggregations run out of memory on Hadoop side where they worked before in 
Hive1. 

In many queries, we have to double the Mapper Memory settings (in our 
particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it makes 
it not so easy to upgrade to Hive 2.

Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
functions.

  was:
While playing with Hive2, we noticed that queries with a lot of count() and 
sum() aggregations run out of memory on Hadoop side much faster than in Hive1.  
In many queries, we have to double the memory (in our particular case 
mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M)

 

Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
functions.


> Count and Sum UDF consume more memory in Hive 2+
> ------------------------------------------------
>
>                 Key: HIVE-20153
>                 URL: https://issues.apache.org/jira/browse/HIVE-20153
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 2.3.2
>            Reporter: Szehon Ho
>            Priority: Major
>         Attachments: Screen Shot 2018-07-12 at 6.41.28 PM.png
>
>
> While playing with Hive2, we noticed that queries with a lot of count() and 
> sum() aggregations run out of memory on Hadoop side where they worked before 
> in Hive1. 
> In many queries, we have to double the Mapper Memory settings (in our 
> particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it 
> makes it not so easy to upgrade to Hive 2.
> Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
> in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
> functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to