[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated HIVE-7405: ------------------------------- Issue Type: Sub-task (was: Bug) Parent: HIVE-7406 > Vectorize Reduce-Side GroupBy > ----------------------------- > > Key: HIVE-7405 > URL: https://issues.apache.org/jira/browse/HIVE-7405 > Project: Hive > Issue Type: Sub-task > Reporter: Matt McCline > Assignee: Matt McCline > > Take advantage of the fact that in most plans a reduce-side GroupBy will get > the group keys in sorted order so aggregation can be done "streaming" and not > require large buffering of intermediate aggregation in memory/storage. > Push any case requiring large buffering -- e.g. COUNT(DISTINCT(..)) -- to > part 2 of Vectorize Reduce-Side GroupBy. In theory, if there is only one > COUNT(DISTINCT(..)) the optimizer could arrange for sorting on the distinct > column(s) as subordinate sort key and do the count of each distinct column(s) > as a "streaming" operation. Then, only multiple COUNT(DISTINCT(..)) would > require large buffering. -- This message was sent by Atlassian JIRA (v6.2#6252)