Some projects that use Calcite (e.g. Drill) model partitioned tables & data using the Distribution trait. A trait represents a physical property of the data; for example, Collation is another trait, and represents how the data is sorted. The Exchange operator can change the distribution of data, analogous to how the Sort operator changes the Collation of the data.
As you correctly say, it is not valid to compute aggregate functions on partitioned data. You need to combine into a single partition first. (Or you can re-partition by the GROUP BY keys, and then safely combine using Union-all.) You can compute partial aggregates on the partitions, then roll them up. For example if you want to compute AVG(x) you can expand to SUM(x) / COUNT(x), and compute partition SUM(x) and COUNT(X) on each partition then sum them using SUM. AggregateUnionTransposeRule and AggregateJoinTransposeRule deal in that kind of logic, and SqlSplittableAggFunction is an SPI that could describe how an arbitrary aggregate function could be split. Julian > On Mar 7, 2016, at 10:58 AM, Anoop Johnson <[email protected]> wrote: > > Hello Everyone - > > We have an in-house distributed database with our own custom query engine. > We're considering using Calcite to replace our current query engine. > > I looked through the examples and one of the adapters. One thing I haven't > quite figured out is using Calcite when the data is partitioned and you > need to fan out the query into all the partitions and combine the partial > results. > > There are several challenges here - for instance, some operations like AVG > are not partitionable, so they have to be rewritten as COUNT and SUM and > only at the final step. > > We could always write custom code to rewrite the query and reassemble the > results. For instance, I saw a planner rule[1] in the Calcite codebase that > does something similar. > > Since this is a common problem, I was wondering if there was a standard way > of handling this use case or if there were any example plugin I could look > at. > > Thanks, > Anoop > > > [1] > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java
