Since these are inner joins, you could convert
Join(i0, i1, c0 and c1 and c2)
into
Filter(c2)
Filter(c1)
Filter(c0)
CrossJoin(i0, i1)
without changing semantics.
I believe that’s what LoptOptimizeMultiJoinRule does - it shreds the joins into
a join graph and reassembles them using a greedy algorithm.
> On Mar 14, 2025, at 8:29 AM, Steven Phillips <[email protected]>
> wrote:
>
> You might try looking into the MultiJoinOptimizeBushy rule. I think it also
> is not doing exactly what you want, but it might be a good starting point
> for a rule that does.
>
>
> On Fri, Mar 14, 2025 at 1:35 AM Mads Sejer Pedersen <[email protected]>
> wrote:
>
>> Hi people,
>>
>> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang
>> that requires typically multiconditional joins to be split into "binary"
>> joins ala:
>> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))],
>> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375
>> LogicalJoin(condition=[=($0, $11)], joinType=[inner]):
>> rowcount = 351.5625, cumulative cost = 820.3125
>> LogicalJoin(condition=[=($0, $3)], joinType=[inner]):
>> rowcount = 93.75, cumulative cost = 343.75
>> LogicalFilter(condition=[SEARCH($1,
>> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount
>> = 25.0, cumulative cost = 125.0
>> LogicalTableScan(table=[[postgres, site]]): rowcount
>> = 100.0, cumulative cost = 100.0
>> LogicalFilter(condition=[SEARCH($6,
>> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0
>> LogicalTableScan(table=[[postgres, so_user]]):
>> rowcount = 100.0, cumulative cost = 100.0
>> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]):
>> rowcount = 25.0, cumulative cost = 125.0
>> LogicalTableScan(table=[[postgres, question]]):
>> rowcount = 100.0, cumulative cost = 100.0
>> LogicalTableScan(table=[[postgres, answer]]): rowcount =
>> 100.0, cumulative cost = 100.0
>>
>>
>> BinaryJoin(condition=[=($60, $2)], joinType=[inner])
>> BinaryJoin(condition=[=($10, $41)], joinType=[inner])
>> BinaryJoin(condition=[=($0, $27)], joinType=[inner])
>> LogicalJoin(condition=[=($0, $11)], joinType=[inner])
>> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>> LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11),
>> 'gaming':CHAR(11), 'mathematica']:CHAR(11))])
>> LogicalTableScan(table=[[postgres, site]])
>> LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])])
>> LogicalTableScan(table=[[postgres, so_user]])
>> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])])
>> LogicalTableScan(table=[[postgres, question]])
>> LogicalTableScan(table=[[postgres, answer]])
>> LogicalTableScan(table=[[postgres, answer]])
>> LogicalTableScan(table=[[postgres, answer]])
>>
>> Is this something that is already supported in Calcite? I have looked at
>> current Calcite rules; JoinToMultiJoinRule, LoptOptimizeJoinRule, but they
>> don't quite fit my use case.
>> Furthermore, if it is not supported, how would one go about implementing
>> such a split? I have looked at a rules-based implementation using the
>> hep-planner. But I am having issues with how to translate the RexInputRef's
>> indexes to the "right" place, as I need the indexes to always point to the
>> new joining table rows.
>>
>>