[
https://issues.apache.org/jira/browse/CALCITE-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090108#comment-18090108
]
Stamatis Zampetakis commented on CALCITE-7604:
----------------------------------------------
Thanks for the feedback Julian!
Yes the rule is not enabled by default. It can be used in a cost based setting
where VolcanoPlanner is used. It can also be incorporated in a HepPlanner phase
as preparatory step for join reordering to allowing more combinations to be
considered but in that case rules must be selected carefully to avoid cycles.
In the initial version of the rule, I added various restrictions to minimize
the chance of introducing bugs by edge cases that I may have missed. So for the
moment the rule only applies on:
# Inner equijoins
# Simple aggregations with splittable functions
# Join keys between grouping columns (from left) and unique columns (from
right)
I have the impression that the rule can also work to some extend for left/right
(and maybe semi) joins but I will leave this as a future extension. Full outer
joins are harder and probably require additional information such as foreign
key constraints to safely perform the transformation although in that case we
should probably use another rule that transforms a full outer join to
left/right outer so it would still be outside of the scope of this particular
rule.
> Add rule to pull up GROUP BY above JOIN
> ---------------------------------------
>
> Key: CALCITE-7604
> URL: https://issues.apache.org/jira/browse/CALCITE-7604
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
>
> Implement a new rule to pull up a GROUP BY above a JOIN when possible. The
> major benefit of the group-by pull up transformation is that the join may
> reduce the number of input rows to the group-by, if the join is selective.
> The idea of pulling the aggregation above a join is rather old and the
> following research papers are among the first that describe the individual
> transformations to a greater extend:
> # Yan & Larson (1995) "Interchanging the order of grouping and join"
> Technical Report
> # Yan & Larson (1995) "Eager Aggregation and Lazy Aggregation", VLDB
> Below a simple query demonstrating the group by pull up transformation using
> SQL syntax.
> +Before+
> {code:sql}
> SELECT s.sales
> FROM (SELECT ss_sold_date_sk, SUM(ss_sales_price) AS sales
> FROM store_sales
> GROUP BY ss_sold_date_sk) s
> JOIN date_dim d
> ON s.ss_sold_date_sk = d.d_date_sk
> WHERE d.d_year = 2000;
> {code}
> +After+
> {code:sql}
> SELECT SUM(ss_sales_price) AS sales
> FROM store_sales s
> JOIN date_dim d
> ON s.ss_sold_date_sk = d.d_date_sk
> WHERE d.d_year = 2000
> GROUP BY s.ss_sold_date_sk;
> {code}
> More examples can be found in the respective papers.
> The first version of the rule aims to cover the simplest form of group-by
> pull up described under "Interchanging the order of grouping and join". The
> more advanced "lazy" variants can be implemented in follow-ups.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)