Matt McCline created HIVE-7405:
----------------------------------
Summary: Vectorize Reduce-Side GroupBy
Key: HIVE-7405
URL: https://issues.apache.org/jira/browse/HIVE-7405
Project: Hive
Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Take advantage of the fact that in most plans a reduce-side GroupBy will get
the group keys in sorted order so aggregation can be done "streaming" and not
require large buffering of intermediate aggregation in memory/storage.
Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part
2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one
COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct
column(s) as subordinate sort key and do the count of each distinct column(s)
as a "streaming" operation. Then, only multiple COUNT(DISTINCT(..)) would
require large buffering.
--
This message was sent by Atlassian JIRA
(v6.2#6252)