Re: Calcite on Partitioned Databases

Julian Hyde Fri, 27 May 2016 17:45:34 -0700

The simplest solution is to make FOO_TABLE a view:

  SELECT * FROM FOO_PART1
  UNION ALL
  SELECT * FROM FOO_PART2
  UNION ALL
  SELECT * FROM FOO_PART3


Then Calcite will give the correct results. But I guess you would like
Calcite to do a partial GROUP BY at each table. For that we would need
a new planner rule, say called AggregateUnionTransposeRule, to push
the aggregation down. After the Union it would then need to combine
the partial results, using another GROUP BY, but that GROUP BY would
be working on a small number of rows and would be fairly efficient.

Does that help?

Julian


On Thu, May 26, 2016 at 3:32 PM, Ravikumar CS <[email protected]> wrote:
> Hello Everyone,
>
>    We are working on revamping the query engine for one of our internal
> database & we are planning to use Calcite.
> I am new to Calcite & its concepts. Calcite is pretty cool & fits our needs
> so well.
>
> This is a partitioned database & we need a distributed query execution
> capability. I went through some pointers here[1].
> I am still not sure on how to proceed with this.
>
> Scenario:
> 1. I would like to have a table say FOO_TABLE. This will be top level table.
> 2. From our Metadatastore, We know that the partition for FOO_TABLE are
>     FOO_PART1, FOO_PART2 & FOO_PART3.
> 3. SQL_QUERY: select C1, SUM(C2), AVG(C3) from FOO_TABLE GROUP BY C1.
>
> Questions:
> 1. How do I convert SQL_QUERY to the individual sub queries ?
> 2. Sub Queries for each partition gets executed on different nodes(we have
> partition-> node affinity information)
> 2. How do I combine them back ?
> 3. Would you have any pointers on how to introduce the parallelism needed ?
>
> Appreciate your help in this regard. Thanks!
>
> ~Ravi
>
> [1]
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201603.mbox/%[email protected]%3E

Re: Calcite on Partitioned Databases

Reply via email to