Hey Ian,

I think you are referring to the problem of query minimization and at
the moment we don't have any such rules in Calcite but it would be a
valuable contribution.
Apart from the algorithm itself, it might be necessary to introduce
some new metadata provider for PK-FK relationships otherwise dropping
a join may remove duplicates and change the semantics of the plan.

You can find some previous discussion here [1].

Best,
Stamatis

[1] https://lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so

On Thu, Mar 30, 2023 at 1:27 AM Ian Bertolacci
<ian.bertola...@workday.com.invalid> wrote:
>
> Howdy,
> Is there a collection of rules which squash a tree of binary joins if the 
> same side of each join is mergeable?
>
> For example:
> ```
> 201:LogicalProject(P4=[$70], P5=[$72])
> └─ 199:LogicalJoin(condition=[=($0, $71)], joinType=[left])
>    ├─ 190:LogicalProject(...)
>    |  └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left])
>    |     ├─ 164:QueryTableScan(table=[[Query, T123]])
>    |     └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
>    |        └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12])
>    |           └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)])
>    |              └─ 165:QueryTableScan(table=[[QUERY, T893]])
>    └─ 197:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
>       └─ 195:LogicalProject(C5633_586=[$85], C5633_203=[$45])
>          └─ 193:LogicalFilter(condition=[IS_NOT_NULL($85)])
>             └─ 172:QueryTableScan(table=[[QUERY, T893]])
> ```
>
> Can be simplified to one join as something like:
> ```
> 201:LogicalProject(P4=[$lhs+1)], P5=[$lhs+2])
> └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left])
>    ├─ 164:QueryTableScan(table=[[Query, T123]])
>    └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1), 
> EXPR$0=[ARRAY_AGG($2)])
>       └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12], C5633_203=[$45])
>          └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)])
>             └─ 165:QueryTableScan(table=[[QUERY, T893]])
> ```
>
>
> I spent some time trying applying various CoreRules, but didn’t immediately 
> see anything I wanted.
> I figure that there exists some set of existing rules which when applied 
> together would accomplish what we want here.
>
> The joins arise from correlated subqueries, but are also synthetically 
> generated, so it is not as simple as asking the user to manually do a single 
> join.
>
> Thanks!
> -Ian Bertolacci

Reply via email to