[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

Navis (JIRA) Mon, 14 Jan 2013 16:32:19 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553336#comment-13553336
 ]


Navis commented on HIVE-3852:
-----------------------------

Namit, 
I don't think I'm right person to answer it but IMHO, it would be dependent to 
reduction ratio by map aggregation. If group by column is rather distinctive, 
this optimization could useful but if it's not, two (or more) MR tasks would be 
faster. 
                
> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-3852
>                 URL: https://issues.apache.org/jira/browse/HIVE-3852
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

Reply via email to