[
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555191#comment-13555191
]
Ashutosh Chauhan commented on HIVE-3852:
----------------------------------------
Namit,
bq. Should we have this optimization now ?
I am not sure which particular optimization you are referring to. I assume you
mean there is no need for reduce-side groupbys anymore, since we have map-side
aggregates. If so, I think those are still required. As Navis, pointed out if
reduction ratio is not high enough, mappers may run out of memory and than we
suggest users to turn-off map-side aggregation.
> Multi-groupby optimization fails when same distinct column is used twice or
> more
> --------------------------------------------------------------------------------
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira