liyang created CALCITE-853:
------------------------------
Summary: EnumerableAggregate should take advantage of input
collation
Key: CALCITE-853
URL: https://issues.apache.org/jira/browse/CALCITE-853
Project: Calcite
Issue Type: Improvement
Reporter: liyang
Assignee: Julian Hyde
Li Yang <[email protected]>
Aug 20 (2 days ago)
I encountered Out Of Mem exception when a huge result set is passed into
EnumerableAggregate and get aggregated in memory. I'm thinking if the input is
sorted by the group-by key, then the groupBy() don't have to hold all data in
memory any more.
Julian Hyde <[email protected]>
2:20 PM (16 hours ago)
Yes, that would be useful. Please log a jira.
Enumerable.groupBy doesn't know its input's collation so can't make that
decision, but EnumerableAggregate does. I think that EnumerableAggregate should
have a "trigger key", a subset of its group key, and if the trigger key changes
it will emit and flush its hash table.
As well as for your use case, it will be useful for streaming queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)