[
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline updated HIVE-7405:
-------------------------------
Issue Type: Sub-task (was: Bug)
Parent: HIVE-7406
> Vectorize Reduce-Side GroupBy
> -----------------------------
>
> Key: HIVE-7405
> URL: https://issues.apache.org/jira/browse/HIVE-7405
> Project: Hive
> Issue Type: Sub-task
> Reporter: Matt McCline
> Assignee: Matt McCline
>
> Take advantage of the fact that in most plans a reduce-side GroupBy will get
> the group keys in sorted order so aggregation can be done "streaming" and not
> require large buffering of intermediate aggregation in memory/storage.
> Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to
> part 2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one
> COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct
> column(s) as subordinate sort key and do the count of each distinct column(s)
> as a "streaming" operation. Then, only multiple COUNT(DISTINCT(..)) would
> require large buffering.
--
This message was sent by Atlassian JIRA
(v6.2#6252)