Matt McCline created HIVE-7405: ---------------------------------- Summary: Vectorize Reduce-Side GroupBy Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline
Take advantage of the fact that in most plans a reduce-side GroupBy will get the group keys in sorted order so aggregation can be done "streaming" and not require large buffering of intermediate aggregation in memory/storage. Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to part 2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct column(s) as subordinate sort key and do the count of each distinct column(s) as a "streaming" operation. Then, only multiple COUNT(DISTINCT(..)) would require large buffering. -- This message was sent by Atlassian JIRA (v6.2#6252)