The original filter has 5 join conditions, but 3 of them are mutually dependent: $11 = $18, $18 = $16, $11 = $16
So there are really only 4 join conditions, which is what you end up with. On Mon, Jun 9, 2025 at 10:17 PM Juri Petersen <[email protected]> wrote: > Hello, > I am in need of some help or guidance. > We are currently trying to parse Join-Order-Benchmark queries into plans > with Apache Calcite. > The following query is our input: > > SELECT MIN(t.title) AS movie_title > FROM postgres.company_name AS cn, > postgres.keyword AS k, > postgres.movie_companies AS mc, > postgres.movie_keyword AS mk, > postgres.title AS t > WHERE cn.country_code ='[de]' > AND k.keyword ='character-name-in-title' > AND cn.id = mc.company_id > AND mc.movie_id = t.id > AND t.id = mk.movie_id > AND mk.keyword_id = k.id > AND mc.movie_id = mk.movie_id; > > This is being parsed into the following RelNode: > > LogicalAggregate(group=[{}], movie_title=[MIN($0)]) > LogicalProject(title=[$19]) > LogicalFilter(condition=[AND(=($2, '[de]'), =($8, > 'character-name-in-title'), =($0, $12), =($11, $18), =($18, $16), =($17, > $7), =($11, $16))]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalJoin(condition=[true], joinType=[inner]) > LogicalTableScan(table=[[postgres, company_name]]) > LogicalTableScan(table=[[postgres, keyword]]) > LogicalTableScan(table=[[postgres, movie_companies]]) > LogicalTableScan(table=[[postgres, movie_keyword]]) > LogicalTableScan(table=[[postgres, title]]) > > I can observe that there are only 4 joins at this point in time. After > this, we optimize the plan with the following RuleSet: > > ( > CoreRules.JOIN_PUSH_TRANSITIVE_PREDICATES, > CoreRules.JOIN_CONDITION_PUSH, > CoreRules.FILTER_INTO_JOIN, > CoreRules.JOIN_ASSOCIATE > ) > > However, the resuliting plan looks like this: > > WayangAggregate(group=[{}], movie_title=[MIN($0)]) > WayangProject(title=[$19]) > WayangJoin(condition=[=($11, $18)], joinType=[inner]) > WayangJoin(condition=[=($0, $12)], joinType=[inner]) > WayangFilter(condition=[=($2, '[de]')]) > WayangTableScan(table=[[postgres, company_name]]) > WayangJoin(condition=[=($10, $0)], joinType=[inner]) > WayangFilter(condition=[=($1, 'character-name-in-title')]) > WayangTableScan(table=[[postgres, keyword]]) > WayangJoin(condition=[=($1, $6)], joinType=[inner]) > WayangTableScan(table=[[postgres, movie_companies]]) > WayangTableScan(table=[[postgres, movie_keyword]]) > WayangTableScan(table=[[postgres, title]]) > > My problem with this is that the original Filter has more than 4 join > conditions, but only 4 joins are being created and some of the Join > conditions are never pushed down or applied. > I am either just misunderstanding why this happens or am missing a rule to > enforce the pushdown/usage of all conditions or instantiation of all joins. > Can anybody give me some pointers? > > Thanks in advance for your help! > > Best, > Juri >
