[ 
https://issues.apache.org/jira/browse/HIVE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi reassigned HIVE-28428:
------------------------------------

    Assignee: Ryu Kobayashi

>  Map hash aggregation performance degradation
> ---------------------------------------------
>
>                 Key: HIVE-28428
>                 URL: https://issues.apache.org/jira/browse/HIVE-28428
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ryu Kobayashi
>            Assignee: Ryu Kobayashi
>            Priority: Major
>         Attachments: 2024-08-02 14.35.46.png, 
> image-2024-08-02-14-37-01-824.png, image-2024-08-02-14-38-45-459.png
>
>
> The following ticket has been fixed to enable map hash aggregation, but 
> performance degradation than when it is disabled.
> https://issues.apache.org/jira/browse/HIVE-23356
> I found a few reasons for this. If there are a large number of keys, the 
> following log will be output in large volume, affecting performance. And, 
> this can also cause an OOM.
> {code:java}
> 2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl 
> flush: #hash table = 171000
> 2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table 
> flushed: new size = 153900
> {code}
> By fixing this, we can improve performance as follows.
> Before:
> !image-2024-08-02-14-37-01-824.png!
> After:
> !2024-08-02 14.35.46.png!
> And, currently the flush size is fixed, but performance can be improved by 
> changing it depending on the data:
> !image-2024-08-02-14-38-45-459.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to