[ https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aihua Xu updated HIVE-20153: ---------------------------- Attachment: HIVE-20153.1.patch > Count and Sum UDF consume more memory in Hive 2+ > ------------------------------------------------ > > Key: HIVE-20153 > URL: https://issues.apache.org/jira/browse/HIVE-20153 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 2.3.2 > Reporter: Szehon Ho > Assignee: Aihua Xu > Priority: Major > Attachments: HIVE-20153.1.patch, Screen Shot 2018-07-12 at 6.41.28 > PM.png > > > While playing with Hive2, we noticed that queries with a lot of count() and > sum() aggregations run out of memory on Hadoop side where they worked before > in Hive1. > In many queries, we have to double the Mapper Memory settings (in our > particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it > makes it not so easy to upgrade to Hive 2. > Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' > in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window > functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)