[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

Yongzhi Chen (JIRA) Wed, 19 Aug 2015 18:02:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704081#comment-14704081
 ]


Yongzhi Chen commented on HIVE-11502:
-------------------------------------

[~xuefuz], for GroupBy's aggregate hashmap uses ListKeyWrapper as key, so it 
uses the ListKey's hashcode. The HashMap does not directly use DoubleWritable's 
hashcode, so we can play in between. And it is safe too: The ListKeyWrapper is 
only used by groupby, so it is only used  internal to hive. 

> Map side aggregation is extremely slow
> --------------------------------------
>
>                 Key: HIVE-11502
>                 URL: https://issues.apache.org/jira/browse/HIVE-11502
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Physical Optimizer
>    Affects Versions: 1.2.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-11502.1.patch, HIVE-11502.2.patch, 
> HIVE-11502.3.patch
>
>
> For the query as following:
> {noformat}
> create table tbl2 as 
> select col1, max(col2) as col2 
> from tbl1 group by col1;
> {noformat}
> If the column for group by has many different values (for example 400000) and 
> it is in type double, the map side aggregation is very slow. I ran the query 
> which took more than 3 hours , after 3 hours, I have to kill the query.
> The same query can finish in 7 seconds, if I turn off map side aggregation by:
> {noformat}
> set hive.map.aggr = false;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

Reply via email to