[ https://issues.apache.org/jira/browse/DRILL-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Hsuan-Yi Chu updated DRILL-1325: ------------------------------------- Fix Version/s: (was: 0.9.0) 0.8.0 > Better error messages when no equality join is specified > -------------------------------------------------------- > > Key: DRILL-1325 > URL: https://issues.apache.org/jira/browse/DRILL-1325 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.4.0 > Environment: MapR 3.1.1 > Drill 0.4.0 > CentOS 6.4 > Reporter: Hao Zhu > Assignee: Sean Hsuan-Yi Chu > Priority: Minor > Fix For: 0.8.0 > > Attachments: DRILL-1325.1.patch > > > Equal hash join works fine, for example: > {code} > > explain plan for select a.*,b.user_port > . . . . . . . . . . . . . . . . . . . . . . .> from cp.`employee.json` a, > sys.drillbits b > . . . . . . . . . . . . . . . . . . . . . . .> where > a.position_id=b.user_port ; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(*=[$0], user_port=[$1]) > 00-02 Project(T1¦¦*=[$0], user_port=[$2]) > 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) > 00-05 | > +------------+------------+ > 1 row selected (0.428 seconds) > {code} > However if we change "=" to "<>" , it failed: > {code} > > explain plan for select a.*,b.user_port > . . . . . . . . . . . . . . . . . . . . . . .> from cp.`employee.json` a, > sys.drillbits b > . . . . . . . . . . . . . . . . . . . . . . .> where > a.position_id<>b.user_port ; > Query failed: Failure while parsing sql. Node > [rel#2056:Subset#4.LOGICAL.ANY([]).[]] could not be implemented; planner > state: > Root: rel#2056:Subset#4.LOGICAL.ANY([]).[] > Original rel: > AbstractConverter(subset=[rel#2056:Subset#4.LOGICAL.ANY([]).[]], > convention=[LOGICAL], DrillDistributionTraitDef=[ANY([])], sort=[[]]): > rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 2057 > ProjectRel(subset=[rel#2055:Subset#4.NONE.ANY([]).[]], *=[$0], > user_port=[$3]): rowcount = 1.7976931348623157E308, cumulative cost = > {1.7976931348623157E308 rows, Infinity cpu, 0.0 io, 0.0 network}, id = 2054 > FilterRel(subset=[rel#2053:Subset#3.NONE.ANY([]).[]], > condition=[<>(CAST($1):INTEGER, $3)]): rowcount = 8.988465674311579E307, > cumulative cost = {8.988465674311579E307 rows, 1.7976931348623157E308 cpu, > 0.0 io, 0.0 network}, id = 2052 > JoinRel(subset=[rel#2051:Subset#2.NONE.ANY([]).[]], condition=[true], > joinType=[inner]): rowcount = 10000.0, cumulative cost = {10000.0 rows, 0.0 > cpu, 0.0 io, 0.0 network}, id = 2050 > > EnumerableTableAccessRel(subset=[rel#2048:Subset#0.ENUMERABLE.ANY([]).[]], > table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 > rows, 101.0 cpu, 0.0 io, 0.0 network}, id = 2043 > > EnumerableTableAccessRel(subset=[rel#2049:Subset#1.ENUMERABLE.ANY([]).[]], > table=[[sys, drillbits]]): rowcount = 100.0, cumulative cost = {100.0 rows, > 101.0 cpu, 0.0 io, 0.0 network}, id = 2044 > Sets: > Set#0, type: (DrillRecordRow[*, position_id]) > rel#2048:Subset#0.ENUMERABLE.ANY([]).[], best=rel#2043, > importance=0.6561 > > rel#2043:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[cp, > employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 > io, 0.0 network} > > rel#2093:AbstractConverter.ENUMERABLE.ANY([]).[](child=rel#2092:Subset#0.LOGICAL.ANY([]).[],convention=ENUMERABLE,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=463.0, cumulative cost={inf} > > rel#2120:AbstractConverter.ENUMERABLE.ANY([]).[](child=rel#2119:Subset#0.NONE.ANY([]).[],convention=ENUMERABLE,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#2092:Subset#0.LOGICAL.ANY([]).[], best=rel#2126, > importance=0.5904900000000001 > > rel#2094:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2048:Subset#0.ENUMERABLE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=100.0, cumulative cost={inf} > > rel#2121:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2119:Subset#0.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2102:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2119:Subset#0.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2118:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2092:Subset#0.LOGICAL.ANY([]).[],*=$0,position_id=$1), > rowcount=463.0, cumulative cost={926.0 rows, 934.0 cpu, 0.0 io, 0.0 network} > rel#2126:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, > employee.json],groupscan=EasyGroupScan [selectionRoot=/employee.json, columns > = null]), rowcount=463.0, cumulative cost={463.0 rows, 926.0 cpu, 0.0 io, 0.0 > network} > rel#2119:Subset#0.NONE.ANY([]).[], best=null, importance=0.531441 > > rel#2122:AbstractConverter.NONE.ANY([]).[](child=rel#2048:Subset#0.ENUMERABLE.ANY([]).[],convention=NONE,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=100.0, cumulative cost={inf} > > rel#2123:AbstractConverter.NONE.ANY([]).[](child=rel#2092:Subset#0.LOGICAL.ANY([]).[],convention=NONE,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=463.0, cumulative cost={inf} > > rel#2080:ProjectRel.NONE.ANY([]).[](child=rel#2048:Subset#0.ENUMERABLE.ANY([]).[],*=$0,position_id=$1), > rowcount=100.0, cumulative cost={inf} > Set#1, type: RecordType(VARCHAR(1) host, INTEGER user_port, INTEGER > control_port, INTEGER data_port) > rel#2049:Subset#1.ENUMERABLE.ANY([]).[], best=rel#2044, > importance=0.6561 > > rel#2044:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[sys, > drillbits]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io, > 0.0 network} > > rel#2096:AbstractConverter.ENUMERABLE.ANY([]).[](child=rel#2095:Subset#1.LOGICAL.ANY([]).[],convention=ENUMERABLE,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=20.0, cumulative cost={inf} > rel#2095:Subset#1.LOGICAL.ANY([]).[], best=rel#2125, > importance=0.5904900000000001 > > rel#2097:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2049:Subset#1.ENUMERABLE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=100.0, cumulative cost={inf} > rel#2125:DrillScanRel.LOGICAL.ANY([]).[](table=[sys, > drillbits],groupscan=SystemTableScan: DRILLBITS), rowcount=20.0, cumulative > cost={20.0 rows, 80.0 cpu, 0.0 io, 0.0 network} > Set#2, type: RecordType(ANY *, ANY position_id, VARCHAR(1) host, INTEGER > user_port, INTEGER control_port, INTEGER data_port) > rel#2051:Subset#2.NONE.ANY([]).[], best=null, > importance=0.7290000000000001 > > rel#2050:JoinRel.NONE.ANY([]).[](left=rel#2048:Subset#0.ENUMERABLE.ANY([]).[],right=rel#2049:Subset#1.ENUMERABLE.ANY([]).[],condition=true,joinType=inner), > rowcount=10000.0, cumulative cost={inf} > rel#2099:Subset#2.LOGICAL.ANY([]).[], best=null, > importance=0.7290000000000001 > > rel#2100:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2051:Subset#2.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2098:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#2092:Subset#0.LOGICAL.ANY([]).[],right=rel#2095:Subset#1.LOGICAL.ANY([]).[],condition=true,joinType=inner), > rowcount=463.0, cumulative cost={inf} > Set#3, type: RecordType(ANY *, ANY position_id, VARCHAR(1) host, INTEGER > user_port, INTEGER control_port, INTEGER data_port) > rel#2053:Subset#3.NONE.ANY([]).[], best=null, importance=0.81 > > rel#2052:FilterRel.NONE.ANY([]).[](child=rel#2051:Subset#2.NONE.ANY([]).[],condition=<>(CAST($1):INTEGER, > $3)), rowcount=8.988465674311579E307, cumulative cost={inf} > > rel#2079:JoinRel.NONE.ANY([]).[](left=rel#2048:Subset#0.ENUMERABLE.ANY([]).[],right=rel#2049:Subset#1.ENUMERABLE.ANY([]).[],condition=<>(CAST($1):INTEGER, > $3),joinType=inner), rowcount=5000.0, cumulative cost={inf} > rel#2068:Subset#3.LOGICAL.ANY([]).[], best=null, importance=0.81 > > rel#2069:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2053:Subset#3.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2109:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#2099:Subset#2.LOGICAL.ANY([]).[],condition=<>(CAST($1):INTEGER, > $3)), rowcount=8.988465674311579E307, cumulative cost={inf} > Set#4, type: RecordType(ANY *, INTEGER user_port) > rel#2055:Subset#4.NONE.ANY([]).[], best=null, importance=0.9 > > rel#2054:ProjectRel.NONE.ANY([]).[](child=rel#2053:Subset#3.NONE.ANY([]).[],*=$0,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2064:ProjectRel.NONE.ANY([]).[](child=rel#2063:Subset#6.NONE.ANY([]).[],*=$0,user_port=$2), > rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#2056:Subset#4.LOGICAL.ANY([]).[], best=null, importance=1.0 > > rel#2057:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2055:Subset#4.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2067:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2065:Subset#6.LOGICAL.ANY([]).[],*=$0,user_port=$2), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2070:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2068:Subset#3.LOGICAL.ANY([]).[],*=$0,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > Set#5, type: RecordType(ANY *, ANY position_id, INTEGER user_port) > rel#2061:Subset#5.NONE.ANY([]).[], best=null, > importance=0.7290000000000001 > > rel#2058:ProjectRel.NONE.ANY([]).[](child=rel#2051:Subset#2.NONE.ANY([]).[],*=$0,position_id=$1,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2113:JoinRel.NONE.ANY([]).[](left=rel#2119:Subset#0.NONE.ANY([]).[],right=rel#2084:Subset#8.NONE.ANY([]).[],condition=true,joinType=inner)Error: > exception while executing query: Failure while trying to get next result > batch. (state=,code=0), rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#2106:Subset#5.LOGICAL.ANY([]).[], best=null, importance=0.81 > > rel#2107:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2061:Subset#5.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2116:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#2092:Subset#0.LOGICAL.ANY([]).[],right=rel#2103:Subset#8.LOGICAL.ANY([]).[],condition=true,joinType=inner), > rowcount=463.0, cumulative cost={inf} > > rel#2124:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2099:Subset#2.LOGICAL.ANY([]).[],*=$0,position_id=$1,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > Set#6, type: RecordType(ANY *, ANY position_id, INTEGER user_port) > rel#2063:Subset#6.NONE.ANY([]).[], best=null, importance=0.81 > > rel#2062:FilterRel.NONE.ANY([]).[](child=rel#2061:Subset#5.NONE.ANY([]).[],condition=<>(CAST($1):INTEGER, > $2)), rowcount=8.988465674311579E307, cumulative cost={inf} > > rel#2073:ProjectRel.NONE.ANY([]).[](child=rel#2053:Subset#3.NONE.ANY([]).[],*=$0,position_id=$1,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2085:JoinRel.NONE.ANY([]).[](left=rel#2119:Subset#0.NONE.ANY([]).[],right=rel#2084:Subset#8.NONE.ANY([]).[],condition=<>(CAST($1):INTEGER, > $2),joinType=inner), rowcount=1.7976931348623157E308, cumulative cost={inf} > rel#2065:Subset#6.LOGICAL.ANY([]).[], best=null, importance=0.9 > > rel#2066:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2063:Subset#6.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2105:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2068:Subset#3.LOGICAL.ANY([]).[],*=$0,position_id=$1,user_port=$3), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2108:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#2106:Subset#5.LOGICAL.ANY([]).[],condition=<>(CAST($1):INTEGER, > $2)), rowcount=8.988465674311579E307, cumulative cost={inf} > Set#8, type: RecordType(INTEGER user_port) > rel#2084:Subset#8.NONE.ANY([]).[], best=null, importance=0.6561 > > rel#2081:ProjectRel.NONE.ANY([]).[](child=rel#2049:Subset#1.ENUMERABLE.ANY([]).[],user_port=$1), > rowcount=100.0, cumulative cost={inf} > rel#2103:Subset#8.LOGICAL.ANY([]).[], best=rel#2117, > importance=0.7290000000000001 > > rel#2104:AbstractConverter.LOGICAL.ANY([]).[](child=rel#2084:Subset#8.NONE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), > rowcount=1.7976931348623157E308, cumulative cost={inf} > > rel#2117:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#2095:Subset#1.LOGICAL.ANY([]).[],user_port=$1), > rowcount=20.0, cumulative cost={40.0 rows, 84.0 cpu, 0.0 io, 0.0 network} > [05104eb3-8320-47c0-b9d2-710840e44a29] > {code} > Here are 2 suggestions/asks: > 1. Do we support non-equal join? if yes , we should fix this issue. > 2. If we do not support, the error message should be more clear and simpler > to tell something like "non-equal join is not supported in current > version...." -- This message was sent by Atlassian JIRA (v6.3.4#6332)