I think it's a mistake to use the same operator for regular and moving 
aggregates. (Moving aggregates are also known as running aggregates. There are 
sub-types called sliding and paged.)

An regular aggregate would be "Compute the total sales for each product each 
month".

A moving aggregate would be "For each sales order, compute the sales of that 
product in that region over the past 20 days".

Consider their output. Regular aggregates output the grouping keys and the 
aggregates. They can't output the input rows because they have been aggregated 
into a single group. Moving aggregates output the original rows PLUS any 
aggregates they compute.

Consider how they are specified. Regular aggregates are specified by a set of 
grouping keys, and a set of aggregate functions.  Moving aggregates are 
specified by the grouping keys (called partition keys in the SQL standard, for 
what it's worth) but also specifications of ordering (for rank etc.) and window 
length (10 rows, or 2 hours).

Consider their internals. Regular aggregates typically use a hashmap. Moving 
aggregates typically use a hashmap but also make heavy use of sorted lists.

Given this, I would separate the aggregate operator into a GroupBy operator and 
a MovingAggregate operator. (The MovingAggregate operator might have sub-types 
such as sliding and paged, as I mentioned above.)

By the way, for both types of aggregates, it makes sense to have an empty set 
of grouping keys. So, Ted is right on the money with 
https://issues.apache.org/jira/browse/DRILL-22.

Julian

Reply via email to