[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553336#comment-13553336
 ] 

Navis commented on HIVE-3852:
-----------------------------

Namit, 
I don't think I'm right person to answer it but IMHO, it would be dependent to 
reduction ratio by map aggregation. If group by column is rather distinctive, 
this optimization could useful but if it's not, two (or more) MR tasks would be 
faster. 
                
> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-3852
>                 URL: https://issues.apache.org/jira/browse/HIVE-3852
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to