Hey Ian, I think you are referring to the problem of query minimization and at the moment we don't have any such rules in Calcite but it would be a valuable contribution. Apart from the algorithm itself, it might be necessary to introduce some new metadata provider for PK-FK relationships otherwise dropping a join may remove duplicates and change the semantics of the plan.
You can find some previous discussion here [1]. Best, Stamatis [1] https://lists.apache.org/thread/vsq59yhfj4glf7mgpf9n6j255myhs0so On Thu, Mar 30, 2023 at 1:27 AM Ian Bertolacci <ian.bertola...@workday.com.invalid> wrote: > > Howdy, > Is there a collection of rules which squash a tree of binary joins if the > same side of each join is mergeable? > > For example: > ``` > 201:LogicalProject(P4=[$70], P5=[$72]) > └─ 199:LogicalJoin(condition=[=($0, $71)], joinType=[left]) > ├─ 190:LogicalProject(...) > | └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left]) > | ├─ 164:QueryTableScan(table=[[Query, T123]]) > | └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)]) > | └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12]) > | └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)]) > | └─ 165:QueryTableScan(table=[[QUERY, T893]]) > └─ 197:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)]) > └─ 195:LogicalProject(C5633_586=[$85], C5633_203=[$45]) > └─ 193:LogicalFilter(condition=[IS_NOT_NULL($85)]) > └─ 172:QueryTableScan(table=[[QUERY, T893]]) > ``` > > Can be simplified to one join as something like: > ``` > 201:LogicalProject(P4=[$lhs+1)], P5=[$lhs+2]) > └─ 188:LogicalJoin(condition=[=($0, $70)], joinType=[left]) > ├─ 164:QueryTableScan(table=[[Query, T123]]) > └─ 186:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1), > EXPR$0=[ARRAY_AGG($2)]) > └─ 184:LogicalProject(C5633_586=[$85], C5633_170=[$12], C5633_203=[$45]) > └─ 182:LogicalFilter(condition=[IS_NOT_NULL($85)]) > └─ 165:QueryTableScan(table=[[QUERY, T893]]) > ``` > > > I spent some time trying applying various CoreRules, but didn’t immediately > see anything I wanted. > I figure that there exists some set of existing rules which when applied > together would accomplish what we want here. > > The joins arise from correlated subqueries, but are also synthetically > generated, so it is not as simple as asking the user to manually do a single > join. > > Thanks! > -Ian Bertolacci