The simplest solution is to make FOO_TABLE a view: SELECT * FROM FOO_PART1 UNION ALL SELECT * FROM FOO_PART2 UNION ALL SELECT * FROM FOO_PART3
Then Calcite will give the correct results. But I guess you would like Calcite to do a partial GROUP BY at each table. For that we would need a new planner rule, say called AggregateUnionTransposeRule, to push the aggregation down. After the Union it would then need to combine the partial results, using another GROUP BY, but that GROUP BY would be working on a small number of rows and would be fairly efficient. Does that help? Julian On Thu, May 26, 2016 at 3:32 PM, Ravikumar CS <[email protected]> wrote: > Hello Everyone, > > We are working on revamping the query engine for one of our internal > database & we are planning to use Calcite. > I am new to Calcite & its concepts. Calcite is pretty cool & fits our needs > so well. > > This is a partitioned database & we need a distributed query execution > capability. I went through some pointers here[1]. > I am still not sure on how to proceed with this. > > Scenario: > 1. I would like to have a table say FOO_TABLE. This will be top level table. > 2. From our Metadatastore, We know that the partition for FOO_TABLE are > FOO_PART1, FOO_PART2 & FOO_PART3. > 3. SQL_QUERY: select C1, SUM(C2), AVG(C3) from FOO_TABLE GROUP BY C1. > > Questions: > 1. How do I convert SQL_QUERY to the individual sub queries ? > 2. Sub Queries for each partition gets executed on different nodes(we have > partition-> node affinity information) > 2. How do I combine them back ? > 3. Would you have any pointers on how to introduce the parallelism needed ? > > Appreciate your help in this regard. Thanks! > > ~Ravi > > [1] > https://mail-archives.apache.org/mod_mbox/calcite-dev/201603.mbox/%[email protected]%3E
