Konstantin Orlov created CALCITE-4542: -----------------------------------------
Summary: Suboptimal plan is chosen when TopDownRuleDriver is enabled Key: CALCITE-4542 URL: https://issues.apache.org/jira/browse/CALCITE-4542 Project: Calcite Issue Type: Bug Affects Versions: 1.26.0 Reporter: Konstantin Orlov Attachments: dump.txt When TopDownRuleDriver is enabled, suboptimal plan is chosen for query with join. We have our own convention and implementation of all necessary relations. A distributed join is considered less expensive by our cost system than a single-distributed join, and the merge join is considered less expensive than nested loop if index over join condition is present. Nevertheless the merge join with a single distribution is chosen by optimizer. The query is: {code:java} select e1."empid", e1."deptno" from "emps" e1 join "emps" e2 on e1."empid" = e2."empid" {code} Actual plan: {code:java} MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]): rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id = 21 MyMergeJoin(subset=[rel#20:RelSubset#1.MY.single.[]], condition=[=($0, $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {150.0 rows, 0.0 cpu, 0.0 io}, id = 50 MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]], distribution=[single]): rowcount = 100.0, cumulative cost = {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30 MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 27 MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]], distribution=[single]): rowcount = 100.0, cumulative cost = {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30 MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 27 {code} Expected plan is {code:java} MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]): rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id = 21 MyExchange(subset=[rel#20:RelSubset#1.MY.single.[]], distribution=[single]): rowcount = 1500.0, cumulative cost = {18420.680743952365 rows, 100.0 cpu, 0.0 io}, id = 24 MyMergeJoin(subset=[rel#23:RelSubset#1.MY.any.[]], condition=[=($0, $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {0.15 rows, 0.0 cpu, 0.0 io}, id = 50 MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 25 MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 25 {code} Planner dump doesn't contain the join with proper distribution. Reproducer could be found [here|https://github.com/korlov42/calcite/tree/derive-not-being-called-repoducer]. Please run {{org.apache.calcite.tools.PlannerTest#test}} -- This message was sent by Atlassian Jira (v8.3.4#803005)