Re: Join not being pushed to JDBC database when also joining across databases

Jordan Halterman Mon, 30 May 2016 22:16:23 -0700

I can try to submit a PR for this some time this week. It's an easy fix, I
just need some time to find my way around the tests.


On Mon, May 30, 2016 at 9:31 PM, Julian Hyde <[email protected]> wrote:

> The JIRA case is https://issues.apache.org/jira/browse/CALCITE-1262 <
> https://issues.apache.org/jira/browse/CALCITE-1262>.
>
> > On May 28, 2016, at 7:46 PM, Jacques Nadeau <[email protected]> wrote:
> >
> > Agreed. We have solved thus in other systems by presenting a name to each
> > convention and use that in the rule name.
> > On May 28, 2016 7:14 PM, "Jordan Halterman" <[email protected]>
> > wrote:
> >
> >> I actually got a chance to do some more digging in to this over the
> >> weekend. When I step through the VolcanoPlanner in my debugger, I
> realize
> >> that while a JdbcFilter rule is being created for each unique
> >> JdbcConvention,
> >> only one of those two rules is actually being fired. This seems to be a
> >> result of the JDBC rule names. JdbcToEnumerableConverterRule is the only
> >> JDBC rule using the convention in its description. All other JDBC rules
> >> just have static descriptions, e.g. JdbcFilterRule or JdbcProjectRule.
> This
> >> results in the rules being fired only for *one* of the JdbcConventions.
> I
> >> verified that the expected plan is output when I change all the JDBC
> rules
> >> to use a unique description-per-convention.
> >>
> >> Does this make sense?
> >>
> >> I will open a Jira ticket for this
> >>
> >> On Thu, May 26, 2016 at 1:21 PM, Jordan Halterman <
> >> [email protected]> wrote:
> >>
> >>> Glad we're seeing the same thing. I can take a shot at reproducing it
> in
> >> a
> >>> test.
> >>>
> >>> On Thu, May 26, 2016 at 1:17 PM, Julian Hyde <[email protected]> wrote:
> >>>
> >>>> I agree with everything you said. The best plan is to join inside the
> >>>> JDBC source, and we’d hope that Calcite could find that plan.
> >>>>
> >>>> It is curious that the JdbcFilter has infinite cost. I can’t tell by
> >>>> reading the trace output exactly why. I wonder whether it has failed
> to
> >>>> make a JdbcTableScan of the correct table within the correct JDBC
> >> database
> >>>> (which, as you note, is a particular instance of JdbcConvention).
> >>>>
> >>>> There’s clearly a Calcite bug here, likely in the JDBC adapter. Can
> you
> >>>> log it?
> >>>>
> >>>> You may be able to convert this into a test case on the standard data
> >>>> sources, e.g. by combining the scott and foodmart hsqldb databases.
> >>>>
> >>>> Julian
> >>>>
> >>>>> On May 26, 2016, at 1:46 AM, Jordan Halterman <
> >>>> [email protected]> wrote:
> >>>>>
> >>>>> I've been doing some experimenting with the Calcite Planner and am
> >>>> curious
> >>>>> about a specific plan that's being generated. I've set up two JDBC
> >>>> schemas,
> >>>>> one with "users" and "items" tables, and one with "orders" and
> >>>> "purchases".
> >>>>> The Planner is set up using the same rules that are used in
> >>>>> CalcitePrepareImpl with EnumerableConvention.
> >>>>>
> >>>>> When I optimize a simple query that joins two tables within the same
> >>>>> database, I get the expected plan, e.g.:
> >>>>>
> >>>>> SELECT u.id AS user_id, i.id AS item_id, i.name, i.description FROM
> >>>> users u
> >>>>> JOIN items i ON u.id = i.user_id
> >>>>>
> >>>>> JdbcToEnumerableConverter: rowcount = 1500.0, cumulative cost =
> >> {3050.0
> >>>>> rows, 5152.0 cpu, 0.0 io}, id = 43
> >>>>> JdbcProject(user_id=[$0], item_id=[$5], name=[$7], description=[$8]):
> >>>>> rowcount = 1500.0, cumulative cost = {2900.0 rows, 5002.0 cpu, 0.0
> >> io},
> >>>> id
> >>>>> = 42
> >>>>>   JdbcJoin(condition=[=($0, $6)], joinType=[inner]): rowcount =
> >> 1500.0,
> >>>>> cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 41
> >>>>>     JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative
> >> cost =
> >>>>> {100.0 rows, 101.0 cpu, 0.0 io}, id = 5
> >>>>>     JdbcTableScan(table=[[ITEMS]]): rowcount = 100.0, cumulative
> >> cost =
> >>>>> {100.0 rows, 101.0 cpu, 0.0 io}, id = 6
> >>>>>
> >>>>> However, when I optimize a query that joins two tables in one
> database
> >>>> with
> >>>>> a table in another database, I don't get the plan I expect. None of
> >> the
> >>>>> joins or filters are done in JDBC, only table scans e.g.:
> >>>>>
> >>>>> SELECT u.id, o.id AS order_id FROM users u JOIN orders o ON u.id =
> >>>>> o.user_id JOIN purchases p ON o.id = p.order_id WHERE p.order_id <
> 50
> >>>>>
> >>>>> EnumerableProject(id=[$6], order_id=[$0]): rowcount = 11250.0,
> >>>> cumulative
> >>>>> cost = {29206.571923496576 rows, 22933.0 cpu, 0.0 io}, id = 141
> >>>>> EnumerableJoin(condition=[=($1, $6)], joinType=[inner]): rowcount =
> >>>>> 11250.0, cumulative cost = {17956.571923496576 rows, 433.0 cpu, 0.0
> >>>> io}, id
> >>>>> = 140
> >>>>>   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> >>>>> 750.0, cumulative cost = {1531.517018598809 rows, 322.0 cpu, 0.0 io},
> >>>> id =
> >>>>> 138
> >>>>>     JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
> >>>> {110.0
> >>>>> rows, 111.0 cpu, 0.0 io}, id = 135
> >>>>>       JdbcTableScan(table=[[ORDERS]]): rowcount = 100.0, cumulative
> >>>> cost
> >>>>> = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
> >>>>>     EnumerableFilter(condition=[<($1, 50)]): rowcount = 50.0,
> >>>> cumulative
> >>>>> cost = {160.0 rows, 211.0 cpu, 0.0 io}, id = 137
> >>>>>       JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
> >>>>> {110.0 rows, 111.0 cpu, 0.0 io}, id = 136
> >>>>>         JdbcTableScan(table=[[PURCHASES]]): rowcount = 100.0,
> >>>> cumulative
> >>>>> cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 12
> >>>>>   JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost =
> >> {110.0
> >>>>> rows, 111.0 cpu, 0.0 io}, id = 139
> >>>>>     JdbcTableScan(table=[[USERS]]): rowcount = 100.0, cumulative
> >> cost =
> >>>>> {100.0 rows, 101.0 cpu, 0.0 io}, id = 9
> >>>>>
> >>>>> It seems to me that the EnumerableFilter and EnumerableJoin of ORDERS
> >>>> and
> >>>>> PURCHASES (which each have the same JdbcConvention) could be done in
> >>>> JDBC,
> >>>>> i.e. something more like this:
> >>>>>
> >>>>> EnumerableProject
> >>>>> EnumerableJoin
> >>>>>   JdbcToEnumerableConverter
> >>>>>     JdbcJoin
> >>>>>       JdbcTableScan(table=[[ORDERS]]: ...)
> >>>>>       JdbcFilter
> >>>>>         JdbcTableScan(table=[[PURCHASES]]: ...)
> >>>>>   JdbcToEnumerableConverter
> >>>>>     JdbcTableScan(table=[[USERS]]: ...)
> >>>>>
> >>>>> So, my question is, why is the join and even the filter not pushed
> >> down
> >>>> to
> >>>>> JDBC? When I log the trace, I see that the JdbcJoin and JdbcFilter
> >> have
> >>>> an
> >>>>> {inf} cost, but I'm ignorant as to why that is. Here's the trace:
> >>>>>
> >>>>> Set#0, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1)
> EMAIL,
> >>>>> VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT)
> >>>>> rel#38:Subset#0.JDBC.site.[], best=rel#9,
> >> importance=0.7290000000000001
> >>>>> rel#9:JdbcTableScan.JDBC.site.[](table=[USERS]), rowcount=100.0,
> >>>> cumulative
> >>>>> cost={100.0 rows, 101.0 cpu, 0.0 io}
> >>>>> rel#97:Subset#0.ENUMERABLE.[0], best=null,
> >> importance=0.7290000000000001
> >>>>> rel#100:Subset#0.ENUMERABLE.[], best=rel#120,
> >>>> importance=0.7290000000000001
> >>>>>
> >>>>
> >>
> rel#120:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#38:Subset#0.JDBC.site.[]),
> >>>>> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> >>>>> Set#1, type: RecordType(INTEGER ID, INTEGER USER_ID)
> >>>>> rel#39:Subset#1.JDBC.cart.[], best=rel#10,
> >> importance=0.7290000000000001
> >>>>> rel#10:JdbcTableScan.JDBC.cart.[](table=[ORDERS]), rowcount=100.0,
> >>>>> cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> >>>>> rel#124:Subset#1.ENUMERABLE.[], best=rel#123, importance=0.6561
> >>>>>
> >>>>
> >>
> rel#123:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#39:Subset#1.JDBC.cart.[]),
> >>>>> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> >>>>> rel#127:Subset#1.JDBC.site.[], best=null,
> >> importance=0.5904900000000001
> >>>>> rel#131:Subset#1.ENUMERABLE.[0], best=null, importance=0.6561
> >>>>> Set#2, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER
> ITEM_ID,
> >>>>> VARCHAR(1) CREATED_AT)
> >>>>> rel#40:Subset#2.JDBC.cart.[], best=rel#12, importance=0.6561
> >>>>> rel#12:JdbcTableScan.JDBC.cart.[](table=[PURCHASES]), rowcount=100.0,
> >>>>> cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> >>>>> rel#91:Subset#2.JDBC.site.[], best=null, importance=0.531441
> >>>>> rel#102:Subset#2.ENUMERABLE.[], best=rel#114,
> >>>> importance=0.5904900000000001
> >>>>>
> >>>>
> >>
> rel#114:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#40:Subset#2.JDBC.cart.[]),
> >>>>> rowcount=100.0, cumulative cost={110.0 rows, 111.0 cpu, 0.0 io}
> >>>>> Set#3, type: RecordType(INTEGER ID, INTEGER ORDER_ID, INTEGER
> ITEM_ID,
> >>>>> VARCHAR(1) CREATED_AT)
> >>>>> rel#42:Subset#3.NONE.[], best=null, importance=0.7290000000000001
> >>>>>
> >>>>
> >>
> rel#41:LogicalFilter.NONE.[](input=rel#40:Subset#2.JDBC.cart.[],condition=<($1,
> >>>>> 50)), rowcount=50.0, cumulative cost={inf}
> >>>>> rel#93:Subset#3.JDBC.site.[], best=null,
> importance=0.5904900000000001
> >>>>>
> >>>>
> >>
> rel#92:JdbcFilter.JDBC.site.[](input=rel#91:Subset#2.JDBC.site.[],condition=<($1,
> >>>>> 50)), rowcount=50.0, cumulative cost={inf}
> >>>>> rel#104:Subset#3.ENUMERABLE.[], best=rel#103, importance=0.6561
> >>>>>
> >>>>
> >>
> rel#103:EnumerableFilter.ENUMERABLE.[](input=rel#102:Subset#2.ENUMERABLE.[],condition=<($1,
> >>>>> 50)), rowcount=50.0, cumulative cost={160.0 rows, 211.0 cpu, 0.0 io}
> >>>>>
> >>>>
> >>
> rel#115:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#93:Subset#3.JDBC.site.[]),
> >>>>> rowcount=50.0, cumulative cost={inf}
> >>>>> rel#132:Subset#3.ENUMERABLE.[1], best=null, importance=0.6561
> >>>>> Set#4, type: RecordType(INTEGER ID, VARCHAR(1) NAME, VARCHAR(1)
> EMAIL,
> >>>>> VARCHAR(1) CREATED_AT, VARCHAR(1) UPDATED_AT, INTEGER ID0, INTEGER
> >>>> USER_ID,
> >>>>> INTEGER ID1, INTEGER ORDER_ID, INTEGER ITEM_ID, VARCHAR(1)
> >> CREATED_AT0)
> >>>>> rel#44:Subset#4.NONE.[], best=null, importance=0.81
> >>>>>
> >>>>
> >>
> rel#43:MultiJoin.NONE.[](input#0=rel#38:Subset#0.JDBC.site.[],input#1=rel#39:Subset#1.JDBC.cart.[],input#2=rel#42:Subset#3.NONE.[],joinFilter=AND(=($5,
> >>>>> $8), =($0, $6)),isFullOuterJoin=false,joinTypes=[INNER, INNER,
> >>>>> INNER],outerJoinConditions=[NULL, NULL, NULL],projFields=[ALL, ALL,
> >>>> ALL]),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#76:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>> rel#49:Subset#4.JDBC.site.[], best=null, importance=0.81
> >>>>>
> >>>>
> >>
> rel#88:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>> rel#56:Subset#4.ENUMERABLE.[], best=rel#94, importance=0.9
> >>>>>
> >>>>
> >>
> rel#94:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],ID=$6,NAME=$7,EMAIL=$8,CREATED_AT=$9,UPDATED_AT=$10,ID0=$0,USER_ID=$1,ID1=$2,ORDER_ID=$3,ITEM_ID=$4,CREATED_AT0=$5),
> >>>>> rowcount=11250.0, cumulative cost={29206.571923496576 rows, 124183.0
> >>>> cpu,
> >>>>> 0.0 io}
> >>>>>
> >>>>
> >>
> rel#113:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#49:Subset#4.JDBC.site.[]),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>> Set#5, type: RecordType(INTEGER id, INTEGER order_id)
> >>>>> rel#46:Subset#5.NONE.[], best=null, importance=0.9
> >>>>>
> >>>>
> >>
> rel#45:LogicalProject.NONE.[](input=rel#44:Subset#4.NONE.[],id=$0,order_id=$5),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#81:LogicalProject.NONE.[](input=rel#75:Subset#7.NONE.[],id=$6,order_id=$0),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#95:LogicalProject.NONE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#107:LogicalProject.NONE.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>> rel#47:Subset#5.ENUMERABLE.[], best=rel#87, importance=1.0
> >>>>>
> >>>>
> >>
> rel#48:AbstractConverter.ENUMERABLE.[](input=rel#46:Subset#5.NONE.[],convention=ENUMERABLE,sort=[]),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#52:AbstractConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[],convention=ENUMERABLE,sort=[]),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#53:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#51:Subset#5.JDBC.site.[]),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#57:EnumerableProject.ENUMERABLE.[](input=rel#56:Subset#4.ENUMERABLE.[],id=$0,order_id=$5),
> >>>>> rowcount=11250.0, cumulative cost={40456.57192349658 rows, 146683.0
> >> cpu,
> >>>>> 0.0 io}
> >>>>>
> >>>>
> >>
> rel#87:EnumerableProject.ENUMERABLE.[](input=rel#86:Subset#7.ENUMERABLE.[],id=$6,order_id=$0),
> >>>>> rowcount=11250.0, cumulative cost={29206.571923496576 rows, 22933.0
> >> cpu,
> >>>>> 0.0 io}
> >>>>> rel#51:Subset#5.JDBC.site.[], best=null, importance=0.9
> >>>>>
> >>>>
> >>
> rel#50:JdbcProject.JDBC.site.[](input=rel#49:Subset#4.JDBC.site.[],id=$0,order_id=$5),
> >>>>> rowcount=1.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#83:JdbcProject.JDBC.site.[](input=rel#82:Subset#7.JDBC.site.[],id=$6,order_id=$0),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>> Set#6, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0,
> >>>> INTEGER
> >>>>> ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT)
> >>>>> rel#73:Subset#6.NONE.[], best=null, importance=0.6561
> >>>>>
> >>>>
> >>
> rel#60:LogicalJoin.NONE.[](left=rel#39:Subset#1.JDBC.cart.[],right=rel#42:Subset#3.NONE.[],condition==($0,
> >>>>> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> >>>>> rel#89:Subset#6.JDBC.site.[], best=null, importance=0.6561
> >>>>>
> >>>>
> >>
> rel#128:JdbcJoin.JDBC.site.[](left=rel#127:Subset#1.JDBC.site.[],right=rel#93:Subset#3.JDBC.site.[],condition==($0,
> >>>>> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> >>>>> rel#96:Subset#6.ENUMERABLE.[1], best=null,
> >> importance=0.7290000000000001
> >>>>> rel#99:Subset#6.ENUMERABLE.[], best=rel#134,
> >>>> importance=0.7290000000000001
> >>>>>
> >>>>
> >>
> rel#130:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#89:Subset#6.JDBC.site.[]),
> >>>>> rowcount=750.0, cumulative cost={inf}
> >>>>> rel#133:EnumerableMergeJoin.ENUMERABLE.[[0],
> >>>>>
> >>>>
> >>
> [3]](left=rel#131:Subset#1.ENUMERABLE.[0],right=rel#132:Subset#3.ENUMERABLE.[1],condition==($0,
> >>>>> $3),joinType=inner), rowcount=750.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#134:EnumerableJoin.ENUMERABLE.[](left=rel#124:Subset#1.ENUMERABLE.[],right=rel#104:Subset#3.ENUMERABLE.[],condition==($0,
> >>>>> $3),joinType=inner), rowcount=750.0, cumulative
> >> cost={1531.517018598809
> >>>>> rows, 322.0 cpu, 0.0 io}
> >>>>> Set#7, type: RecordType(INTEGER ID, INTEGER USER_ID, INTEGER ID0,
> >>>> INTEGER
> >>>>> ORDER_ID, INTEGER ITEM_ID, VARCHAR(1) CREATED_AT, INTEGER ID1,
> >>>> VARCHAR(1)
> >>>>> NAME, VARCHAR(1) EMAIL, VARCHAR(1) CREATED_AT0, VARCHAR(1)
> UPDATED_AT)
> >>>>> rel#75:Subset#7.NONE.[], best=null, importance=0.7290000000000001
> >>>>>
> >>>>
> >>
> rel#74:LogicalJoin.NONE.[](left=rel#73:Subset#6.NONE.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
> >>>>> $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> >>>>> rel#82:Subset#7.JDBC.site.[], best=null,
> importance=0.7290000000000001
> >>>>>
> >>>>
> >>
> rel#90:JdbcJoin.JDBC.site.[](left=rel#89:Subset#6.JDBC.site.[],right=rel#38:Subset#0.JDBC.site.[],condition==($6,
> >>>>> $1),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> >>>>> rel#86:Subset#7.ENUMERABLE.[], best=rel#101, importance=0.81
> >>>>> rel#98:EnumerableMergeJoin.ENUMERABLE.[[1],
> >>>>>
> >>>>
> >>
> [6]](left=rel#96:Subset#6.ENUMERABLE.[1],right=rel#97:Subset#0.ENUMERABLE.[0],condition==($1,
> >>>>> $6),joinType=inner), rowcount=11250.0, cumulative cost={inf}
> >>>>>
> >>>>
> >>
> rel#101:EnumerableJoin.ENUMERABLE.[](left=rel#99:Subset#6.ENUMERABLE.[],right=rel#100:Subset#0.ENUMERABLE.[],condition==($1,
> >>>>> $6),joinType=inner), rowcount=11250.0, cumulative
> >>>> cost={17956.571923496576
> >>>>> rows, 433.0 cpu, 0.0 io}
> >>>>>
> >>>>
> >>
> rel#109:JdbcToEnumerableConverter.ENUMERABLE.[](input=rel#82:Subset#7.JDBC.site.[]),
> >>>>> rowcount=11250.0, cumulative cost={inf}
> >>>>
> >>>>
> >>>
> >>
>
>

Re: Join not being pushed to JDBC database when also joining across databases

Reply via email to