I've been doing some experimenting with the Calcite Planner and am curious
about a specific plan that's being generated. I've set up two JDBC schemas,
one with "users" and "items" tables, and one with "orders" and "purchases".
The Planner is set up using the same rules that are used in
CalcitePrepareImpl with EnumerableConvention.
When I optimize a simple query that joins two tables within the same
database, I get the expected plan, e.g.:
SELECT u.id AS user_id, i.id AS item_id, i.name, i.description FROM users u
JOIN items i ON u.id = i.user_id
JdbcToEnumerableConverter: rowcount = 1500.0, cumulative cost = {3050.0
rows, 5152.0 cpu, 0.0 io}, id = 43
JdbcProject(user_id=[$0], item_id=[$5], name=[$7], description=[$8]):
rowcount = 1500.0, cumulative cost = {2900.0 rows, 5002.0 cpu, 0.0 io}, id
= 42
JdbcJoin(condition=[=($0, $6)], joinType=[inner]): rowcount = 1500.0,
cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 41
JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
{100.0 rows, 101.0 cpu, 0.0 io}, id = 5
JdbcTableScan(table=[[ITEMS]]): rowcount = 100.0, cumulative cost =
{100.0 rows, 101.0 cpu, 0.0 io}, id = 6
However, when I optimize a query that joins two tables in one database with
a table in another database, I don't get the plan I expect. None of the
joins or filters are done in JDBC, only table scans e.g.:
SELECT u.id, o.id AS order_id FROM users u JOIN orders o ON u.id =
o.user_id JOIN purchases p ON o.id = p.order_id WHERE p.order_id < 50
EnumerableProject(id=[$6], order_id=[$0]): rowcount = 11250.0, cumulative
cost = {29206.571923496576 rows, 22933.0 cpu, 0.0 io}, id = 141
EnumerableJoin(condition=[=($1, $6)], joinType=[inner]): rowcount =
11250.0, cumulative cost = {17956.571923496576 rows, 433.0 cpu, 0.0 io}, id
= 140
EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
750.0, cumulative cost = {1531.517018598809 rows, 322.0 cpu, 0.0 io}, id =
138
JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0
rows, 111.0 cpu, 0.0 io}, id = 135
JdbcTableScan(table=[[ORDERS]]): rowcount = 100.0, cumulative cost
= {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
EnumerableFilter(condition=[<($1, 50)]): rowcount = 50.0, cumulative
cost = {160.0 rows, 211.0 cpu, 0.0 io}, id = 137
JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
{110.0 rows, 111.0 cpu, 0.0 io}, id = 136
JdbcTableScan(table=[[PURCHASES]]): rowcount = 100.0, cumulative
cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 12
JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0
rows, 111.0 cpu, 0.0 io}, id = 139
JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative cost =
{100.0 rows, 101.0 cpu, 0.0 io}, id = 9
It seems to me that the EnumerableFilter and EnumerableJoin of ORDERS and
PURCHASES (which each have the same JdbcConvention) could be done in JDBC,
i.e. something more like this:
EnumerableProject
EnumerableJoin
JdbcToEnumerableConverter
JdbcJoin
JdbcTableScan(table=[[ORDERS]]: ...)
JdbcFilter
JdbcTableScan(table=[[PURCHASES]]: ...)
JdbcToEnumerableConverter
JdbcTableScan(table=[[USERS]]: ...)
So, my question is, why is the join and even the filter not pushed down to
JDBC? When I log the trace, I see that the JdbcJoin and JdbcFilter have an
{inf} cost, but I'm ignorant as to why that is. Here's the trace:
Set#0, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT)
rel#38:Subset#0.JDBC.site.[], best=rel#9, importance=0.7290000000000001
rel#9:JdbcTableScan.JDBC.site.[](table=[USERS]), rowcount=100.0, cumulative
cost={100.0 rows, 101.0 cpu, 0.0 io}
rel#97:Subset#0.ENUMERABLE.[0], best=null, importance=0.7290000000000001
rel#100:Subset#0.ENUMERABLE.[], best=rel#120, importance=0.7290000000000001
rel#120:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#38:Subset#0.JDBC.site.[]),
rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
Set#1, type: RecordType(INTEGER ID, INTEGER USER_ID)
rel#39:Subset#1.JDBC.cart.[], best=rel#10, importance=0.7290000000000001
rel#10:JdbcTableScan.JDBC.cart.[](table=[ORDERS]), rowcount=100.0,
cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
rel#124:Subset#1.ENUMERABLE.[], best=rel#123, importance=0.6561
rel#123:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#39:Subset#1.JDBC.cart.[]),
rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
rel#127:Subset#1.JDBC.site.[], best=null, importance=0.5904900000000001
rel#131:Subset#1.ENUMERABLE.[0], best=null, importance=0.6561
Set#2, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
VARCHAR(1) CREATED_AT)
rel#40:Subset#2.JDBC.cart.[], best=rel#12, importance=0.6561
rel#12:JdbcTableScan.JDBC.cart.[](table=[PURCHASES]), rowcount=100.0,
cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
rel#91:Subset#2.JDBC.site.[], best=null, importance=0.531441
rel#102:Subset#2.ENUMERABLE.[], best=rel#114, importance=0.5904900000000001
rel#114:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#40:Subset#2.JDBC.cart.[]),
rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
Set#3, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER ITEM_ID,
VARCHAR(1) CREATED_AT)
rel#42:Subset#3.NONE.[], best=null, importance=0.7290000000000001
rel#41:LogicalFilter.NONE.[](input=rel#40:Subset#2.JDBC.cart.[],condition=<($1,
50)), rowcount=50.0, cumulative cost={inf}
rel#93:Subset#3.JDBC.site.[], best=null, importance=0.5904900000000001
rel#92:JdbcFilter.JDBC.site.[](input=rel#91:Subset#2.JDBC.site.[],condition=<($1,
50)), rowcount=50.0, cumulative cost={inf}
rel#104:Subset#3.ENUMERABLE.[], best=rel#103, importance=0.6561
rel#103:EnumerableFilter.ENUMERABLE.[](input=rel#102:Subset#2.ENUMERABLE.[],condition=<($1,
50)), rowcount=50.0, cumulative cost={160.0 rows, 211.0 cpu, 0.0 io}
rel#115:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#93:Subset#3.JDBC.site.[]),
rowcount=50.0, cumulative cost={inf}
rel#132:Subset#3.ENUMERABLE.[1], best=null, importance=0.6561
Set#4, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1) EMAIL,
VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT, INTEGER ID0, INTEGER USER_ID,
INTEGER ID1, INTEGER ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT0)
rel#44:Subset#4.NONE.[], best=null, importance=0.81
rel#43:MultiJoin.NONE.[](input#0=rel#38:Subset#0.JDBC.site.[],input#1=rel#39:Subset#1.JDBC.cart.[],input#2=rel#42:Subset#3.NONE.[],joinFilter=AND(=($5,
$8), =($0, $6)),isFullOuterJoin=false,joinTypes=[INNER, INNER,
INNER],outerJoinConditions=[NULL, NULL, NULL],projFields=[ALL, ALL, ALL]),
rowcount=1.0, cumulative cost={inf}
rel#76:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
rowcount=11250.0, cumulative cost={inf}
rel#49:Subset#4.JDBC.site.[], best=null, importance=0.81
rel#88:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
rowcount=11250.0, cumulative cost={inf}
rel#56:Subset#4.ENUMERABLE.[], best=rel#94, importance=0.9
rel#94:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
rowcount=11250.0, cumulative cost={29206.571923496576 rows, 124183.0 cpu,
0.0 io}
rel#113:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#49:Subset#4.JDBC.site.[]),
rowcount=1.0, cumulative cost={inf}
Set#5, type: RecordType(INTEGER id, INTEGER order_id)
rel#46:Subset#5.NONE.[], best=null, importance=0.9
rel#45:LogicalProject.NONE.[](input=rel#44:Subset#4.NONE.[],id=$0,order_id=$5),
rowcount=1.0, cumulative cost={inf}
rel#81:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],id=$6,order_id=$0),
rowcount=11250.0, cumulative cost={inf}
rel#95:LogicalProject.NONE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
rowcount=11250.0, cumulative cost={inf}
rel#107:LogicalProject.NONE.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
rowcount=11250.0, cumulative cost={inf}
rel#47:Subset#5.ENUMERABLE.[], best=rel#87, importance=1.0
rel#48:AbstractConverter.ENUMERABLE.[](input=rel#46:Subset#5.NONE.[],convention=ENUMERABLE,sort=[]),
rowcount=1.0, cumulative cost={inf}
rel#52:AbstractConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[],convention=ENUMERABLE,sort=[]),
rowcount=1.0, cumulative cost={inf}
rel#53:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[]),
rowcount=1.0, cumulative cost={inf}
rel#57:EnumerableProject.ENUMERABLE.[](input=rel#56:Subset#4.ENUMERABLE.[],id=$0,order_id=$5),
rowcount=11250.0, cumulative cost={40456.57192349658 rows, 146683.0 cpu,
0.0 io}
rel#87:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
rowcount=11250.0, cumulative cost={29206.571923496576 rows, 22933.0 cpu,
0.0 io}
rel#51:Subset#5.JDBC.site.[], best=null, importance=0.9
rel#50:JdbcProject.JDBC.site.[](input=rel#49:Subset#4.JDBC.site.[],id=$0,order_id=$5),
rowcount=1.0, cumulative cost={inf}
rel#83:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
rowcount=11250.0, cumulative cost={inf}
Set#6, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER
ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT)
rel#73:Subset#6.NONE.[], best=null, importance=0.6561
rel#60:LogicalJoin.NONE.[](left=rel#39:Subset#1.JDBC.cart.[],right=rel#42:Subset#3.NONE.[],condition==($0,
$3),joinType=inner), rowcount=750.0, cumulative cost={inf}
rel#89:Subset#6.JDBC.site.[], best=null, importance=0.6561
rel#128:JdbcJoin.JDBC.site.[](left=rel#127:Subset#1.JDBC.site.[],right=rel#93:Subset#3.JDBC.site.[],condition==($0,
$3),joinType=inner), rowcount=750.0, cumulative cost={inf}
rel#96:Subset#6.ENUMERABLE.[1], best=null, importance=0.7290000000000001
rel#99:Subset#6.ENUMERABLE.[], best=rel#134, importance=0.7290000000000001
rel#130:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#89:Subset#6.JDBC.site.[]),
rowcount=750.0, cumulative cost={inf}
rel#133:EnumerableMergeJoin.ENUMERABLE.[[0],
[3]](left=rel#131:Subset#1.ENUMERABLE.[0],right=rel#132:Subset#3.ENUMERABLE.[1],condition==($0,
$3),joinType=inner), rowcount=750.0, cumulative cost={inf}
rel#134:EnumerableJoin.ENUMERABLE.[](left=rel#124:Subset#1.ENUMERABLE.[],right=rel#104:Subset#3.ENUMERABLE.[],condition==($0,
$3),joinType=inner), rowcount=750.0, cumulative cost={1531.517018598809
rows, 322.0 cpu, 0.0 io}
Set#7, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0, INTEGER
ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT, INTEGER ID1, VARCHAR(1)
NAME, VARCHAR(1) EMAIL, VARCHAR(1) CREATED_AT0, VARCHAR(1) UPDATED_AT)
rel#75:Subset#7.NONE.[], best=null, importance=0.7290000000000001
rel#74:LogicalJoin.NONE.[](left=rel#73:Subset#6.NONE.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
$1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
rel#82:Subset#7.JDBC.site.[], best=null, importance=0.7290000000000001
rel#90:JdbcJoin.JDBC.site.[](left=rel#89:Subset#6.JDBC.site.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
$1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
rel#86:Subset#7.ENUMERABLE.[], best=rel#101, importance=0.81
rel#98:EnumerableMergeJoin.ENUMERABLE.[[1],
[6]](left=rel#96:Subset#6.ENUMERABLE.[1],right=rel#97:Subset#0.ENUMERABLE.[0],condition==($1,
$6),joinType=inner), rowcount=11250.0, cumulative cost={inf}
rel#101:EnumerableJoin.ENUMERABLE.[](left=rel#99:Subset#6.ENUMERABLE.[],right=rel#100:Subset#0.ENUMERABLE.[],condition==($1,
$6),joinType=inner), rowcount=11250.0, cumulative cost={17956.571923496576
rows, 433.0 cpu, 0.0 io}
rel#109:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#82:Subset#7.JDBC.site.[]),
rowcount=11250.0, cumulative cost={inf}