Matt McCline created HIVE-7405:
----------------------------------

             Summary: Vectorize Reduce-Side GroupBy
                 Key: HIVE-7405
                 URL: https://issues.apache.org/jira/browse/HIVE-7405
             Project: Hive
          Issue Type: Bug
            Reporter: Matt McCline
            Assignee: Matt McCline



Take advantage of the fact that in most plans a reduce-side GroupBy will get 
the group keys in sorted order so aggregation can be done "streaming" and not 
require large buffering of intermediate aggregation in memory/storage.

Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part 
2 of Vectorize Reduce-Side GroupBy.  In theory, if there is only one 
COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct 
column(s) as subordinate sort key and do the count of each distinct column(s) 
as a "streaming" operation.  Then, only multiple COUNT(DISTINCT(..)) would 
require large buffering.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to