The original filter has 5 join conditions, but 3 of them are mutually
dependent: $11 = $18, $18 = $16, $11 = $16

So there are really only 4 join conditions, which is what you end up with.

On Mon, Jun 9, 2025 at 10:17 PM Juri Petersen <[email protected]> wrote:

> Hello,
> I am in need of some help or guidance.
> We are currently trying to parse Join-Order-Benchmark queries into plans
> with Apache Calcite.
> The following query is our input:
>
> SELECT MIN(t.title) AS movie_title
> FROM postgres.company_name AS cn,
>      postgres.keyword AS k,
>      postgres.movie_companies AS mc,
>      postgres.movie_keyword AS mk,
>      postgres.title AS t
> WHERE cn.country_code ='[de]'
>   AND k.keyword ='character-name-in-title'
>   AND cn.id = mc.company_id
>   AND mc.movie_id = t.id
>   AND t.id = mk.movie_id
>   AND mk.keyword_id = k.id
>   AND mc.movie_id = mk.movie_id;
>
> This is being parsed into the following RelNode:
>
> LogicalAggregate(group=[{}], movie_title=[MIN($0)])
>   LogicalProject(title=[$19])
>     LogicalFilter(condition=[AND(=($2, '[de]'), =($8,
> 'character-name-in-title'), =($0, $12), =($11, $18), =($18, $16), =($17,
> $7), =($11, $16))])
>       LogicalJoin(condition=[true], joinType=[inner])
>         LogicalJoin(condition=[true], joinType=[inner])
>           LogicalJoin(condition=[true], joinType=[inner])
>             LogicalJoin(condition=[true], joinType=[inner])
>               LogicalTableScan(table=[[postgres, company_name]])
>               LogicalTableScan(table=[[postgres, keyword]])
>             LogicalTableScan(table=[[postgres, movie_companies]])
>           LogicalTableScan(table=[[postgres, movie_keyword]])
>         LogicalTableScan(table=[[postgres, title]])
>
> I can observe that there are only 4 joins at this point in time. After
> this, we optimize the plan with the following RuleSet:
>
> (
>             CoreRules.JOIN_PUSH_TRANSITIVE_PREDICATES,
>             CoreRules.JOIN_CONDITION_PUSH,
>             CoreRules.FILTER_INTO_JOIN,
>             CoreRules.JOIN_ASSOCIATE
> )
>
> However, the resuliting plan looks like this:
>
> WayangAggregate(group=[{}], movie_title=[MIN($0)])
>   WayangProject(title=[$19])
>     WayangJoin(condition=[=($11, $18)], joinType=[inner])
>       WayangJoin(condition=[=($0, $12)], joinType=[inner])
>         WayangFilter(condition=[=($2, '[de]')])
>           WayangTableScan(table=[[postgres, company_name]])
>         WayangJoin(condition=[=($10, $0)], joinType=[inner])
>           WayangFilter(condition=[=($1, 'character-name-in-title')])
>             WayangTableScan(table=[[postgres, keyword]])
>           WayangJoin(condition=[=($1, $6)], joinType=[inner])
>             WayangTableScan(table=[[postgres, movie_companies]])
>             WayangTableScan(table=[[postgres, movie_keyword]])
>       WayangTableScan(table=[[postgres, title]])
>
> My problem with this is that the original Filter has more than 4 join
> conditions, but only 4 joins are being created and some of the Join
> conditions are never pushed down or applied.
> I am either just misunderstanding why this happens or am missing a rule to
> enforce the pushdown/usage of all conditions or instantiation of all joins.
> Can anybody give me some pointers?
>
> Thanks in advance for your help!
>
> Best,
> Juri
>

Reply via email to