[ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7405:
-------------------------------

    Issue Type: Sub-task  (was: Bug)
        Parent: HIVE-7406

> Vectorize Reduce-Side GroupBy
> -----------------------------
>
>                 Key: HIVE-7405
>                 URL: https://issues.apache.org/jira/browse/HIVE-7405
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>
> Take advantage of the fact that in most plans a reduce-side GroupBy will get 
> the group keys in sorted order so aggregation can be done "streaming" and not 
> require large buffering of intermediate aggregation in memory/storage.
> Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to 
> part 2 of Vectorize Reduce-Side GroupBy.  In theory, if there is only one 
> COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct 
> column(s) as subordinate sort key and do the count of each distinct column(s) 
> as a "streaming" operation.  Then, only multiple COUNT(DISTINCT(..)) would 
> require large buffering.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to