Aman Sinha created DRILL-690:
--------------------------------
Summary: Create 2-Phase aggregate plans for SUM, MIN, MAX
Key: DRILL-690
URL: https://issues.apache.org/jira/browse/DRILL-690
Project: Apache Drill
Issue Type: Improvement
Reporter: Aman Sinha
Currently, Drill generates 1-phase plans for aggregations with group-by where
we do an initial distribution (if necessary) followed by either a sort +
streaming aggregate or a hash aggregate. In many cases, we should be able to
do a 2-phase aggregation:
Phase 1: local grouped-aggregation first and collapse potentially to
a small number of groups,
Intermediate step: hash-distribution (on grouping keys)
Phase 2: final aggregation.
The amount of data transferred over the network can be potentially much smaller
compared to the 1-phase approach.
For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same
aggregate function; however for other aggregate functions such as COUNT, the
first phase has to do a count and second phase must SUM the counts. In this
particular enhancement, we will only address the functions SUM, MIN, MAX.
--
This message was sent by Atlassian JIRA
(v6.2#6252)