Junxian Wu created CALCITE-1803:
-----------------------------------

             Summary: Add post aggregation support in Druid to optimize druid 
queries.
                 Key: CALCITE-1803
                 URL: https://issues.apache.org/jira/browse/CALCITE-1803
             Project: Calcite
          Issue Type: Bug
          Components: druid
    Affects Versions: 1.11.0
            Reporter: Junxian Wu
            Assignee: Julian Hyde


Druid post aggregations are not supported when parsing SQL queries. By 
implementing post aggregations, we can offload some computation to the druid 
cluster rather than aggregate on the client side.

Example usage:
{{SELECT SUM("column1") - SUM("column2") FROM "table";}}
This query will be parsed into two separate Druid aggregations according to 
current rules. Then the results will be subtracted in Calcite. By using the 
{{postAggregations}} field in the druid query, the subtraction could be done in 
Druid cluster. Although the previous example is simple, the difference will be 
obvious when the number of result rows are large. (Multiple rows result will 
happen when group by is used).
Questions:
After I push Post aggregation into Druid query, what should I change on the 
project relational correlation? In the case of the example above, the 
{{BindableProject}} will have the expression to representation the subtraction. 
If I push the post aggregation into druid query, the expression of subtraction 
should be replaced by the representation of the post aggregations result. For 
now, the project expression seems can only point to the aggregations results. 
Since post aggregations have to point to aggregations results too, it could not 
be placed in the parallel level as aggregation. Where should I put post 
aggregations?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to