Hello,
I am in need of some help or guidance.
We are currently trying to parse Join-Order-Benchmark queries into plans with
Apache Calcite.
The following query is our input:
SELECT MIN(t.title) AS movie_title
FROM postgres.company_name AS cn,
postgres.keyword AS k,
postgres.movie_companies AS mc,
postgres.movie_keyword AS mk,
postgres.title AS t
WHERE cn.country_code ='[de]'
AND k.keyword ='character-name-in-title'
AND cn.id = mc.company_id
AND mc.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND mc.movie_id = mk.movie_id;
This is being parsed into the following RelNode:
LogicalAggregate(group=[{}], movie_title=[MIN($0)])
LogicalProject(title=[$19])
LogicalFilter(condition=[AND(=($2, '[de]'), =($8,
'character-name-in-title'), =($0, $12), =($11, $18), =($18, $16), =($17, $7),
=($11, $16))])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[postgres, company_name]])
LogicalTableScan(table=[[postgres, keyword]])
LogicalTableScan(table=[[postgres, movie_companies]])
LogicalTableScan(table=[[postgres, movie_keyword]])
LogicalTableScan(table=[[postgres, title]])
I can observe that there are only 4 joins at this point in time. After this, we
optimize the plan with the following RuleSet:
(
CoreRules.JOIN_PUSH_TRANSITIVE_PREDICATES,
CoreRules.JOIN_CONDITION_PUSH,
CoreRules.FILTER_INTO_JOIN,
CoreRules.JOIN_ASSOCIATE
)
However, the resuliting plan looks like this:
WayangAggregate(group=[{}], movie_title=[MIN($0)])
WayangProject(title=[$19])
WayangJoin(condition=[=($11, $18)], joinType=[inner])
WayangJoin(condition=[=($0, $12)], joinType=[inner])
WayangFilter(condition=[=($2, '[de]')])
WayangTableScan(table=[[postgres, company_name]])
WayangJoin(condition=[=($10, $0)], joinType=[inner])
WayangFilter(condition=[=($1, 'character-name-in-title')])
WayangTableScan(table=[[postgres, keyword]])
WayangJoin(condition=[=($1, $6)], joinType=[inner])
WayangTableScan(table=[[postgres, movie_companies]])
WayangTableScan(table=[[postgres, movie_keyword]])
WayangTableScan(table=[[postgres, title]])
My problem with this is that the original Filter has more than 4 join
conditions, but only 4 joins are being created and some of the Join conditions
are never pushed down or applied.
I am either just misunderstanding why this happens or am missing a rule to
enforce the pushdown/usage of all conditions or instantiation of all joins.
Can anybody give me some pointers?
Thanks in advance for your help!
Best,
Juri