[ https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker updated DRILL-6631: --------------------------------- Reviewer: Sorabh Hamirwasia > Wrong result from LateralUnnest query with aggregation and order by > ------------------------------------------------------------------- > > Key: DRILL-6631 > URL: https://issues.apache.org/jira/browse/DRILL-6631 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.14.0 > Reporter: Parth Chandra > Assignee: Parth Chandra > Priority: Major > Fix For: 1.15.0 > > > Reported by Chun: > The following query gives correct result: > {noformat} > 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, > customer.c_name, orders.totalprice from customer, lateral (select > sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE > t.o.o_totalprice in > (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders > where customer.c_custkey = 101276; > +------------+---------------------+-------------+ > | c_custkey | c_name | totalprice | > +------------+---------------------+-------------+ > | 101276 | Customer#000101276 | 82657.72 | > +------------+---------------------+-------------+ > 1 row selected (6.184 seconds) > {noformat} > But if I remove the where clause and replace it with order by and limit, I > got the following empty result set. This is wrong. > {noformat} > 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, > customer.c_name, orders.totalprice from customer, lateral (select > sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE > t.o.o_totalprice in > (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders > order by customer.c_custkey limit 50; > +------------+---------+-------------+ > | c_custkey | c_name | totalprice | > +------------+---------+-------------+ > +------------+---------+-------------+ > No rows selected (2.753 seconds) > {noformat} > Here is the plan for the query giving the correct result: > {noformat} > 00-00 Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY > totalprice): rowcount = 472783.35, cumulative cost = {8242193.7349999985 > rows, 4.102185433499999E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 > memory}, id = 14410 > 00-01 Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = > RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, > cumulative cost = {8194915.3999999985 rows, 4.097457599999999E7 cpu, 0.0 io, > 5.809561804799999E9 network, 0.0 memory}, id = 14409 > 00-02 UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, > ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.049999999 > rows, 3.9556225949999996E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 > memory}, id = 14408 > 01-01 LateralJoin(correlation=[$cor1], joinType=[inner], > requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType > = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = > 472783.35, cumulative cost = {7249348.699999999 rows, 3.577395915E7 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 14407 > 01-03 SelectionVectorRemover : rowType = RecordType(ANY c_orders, > ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = > {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > 14403 > 01-05 Filter(condition=[=($1, 101276)]) : rowType = > RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, > cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 14402 > 01-07 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, > numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], > files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : > rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = > 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 14401 > 01-02 StreamAgg(group=[{}], totalprice=[SUM($0)]) : rowType = > RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 rows, 19.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14406 > 01-04 Filter(condition=[OR(=($0, 89230.03), =($0, 270087.44), > =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 65277.06), =($0, > 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative > cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14405 > 01-06 Project(ITEM=[ITEM($0, 'o_totalprice')]) : rowType = > RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, 2.0 cpu, > 0.0 io, 0.0 network, 0.0 memory}, id = 14404 > 01-08 Unnest [srcOp=01-01] : rowType = RecordType(ANY > c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 14240 > {noformat} > And here is the plan for the query giving wrong result: > {noformat} > 00-00 Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY > totalprice): rowcount = 50.0, cumulative cost = {2.2063382E7 rows, > 2.6342088596853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, id = > 15076 > 00-01 Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = > RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 50.0, > cumulative cost = {2.2063377E7 rows, 2.6342088096853146E8 cpu, 0.0 io, > 7.7460824064E10 network, 0.0 memory}, id = 15075 > 00-02 SelectionVectorRemover : rowType = RecordType(ANY c_custkey, ANY > c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063327E7 > rows, 2.6342073096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, > id = 15074 > 00-03 Limit(fetch=[50]) : rowType = RecordType(ANY c_custkey, ANY > c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063277E7 > rows, 2.6342068096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, > id = 15073 > 00-04 SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost > = {2.2063227E7 rows, 2.6342048096853146E8 cpu, 0.0 io, 7.7460824064E10 > network, 0.0 memory}, id = 15072 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost > = {1.8911338E7 rows, 2.1299025696853146E8 cpu, 0.0 io, 3.8730412032E10 > network, 0.0 memory}, id = 15071 > 02-01 SelectionVectorRemover : rowType = RecordType(ANY > c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost > = {1.5759449E7 rows, 2.0983836796853146E8 cpu, 0.0 io, 3.8730412032E10 > network, 0.0 memory}, id = 15070 > 02-02 TopN(limit=[50]) : rowType = RecordType(ANY c_custkey, > ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost = > {1.260756E7 rows, 2.0668647896853146E8 cpu, 0.0 io, 3.8730412032E10 network, > 0.0 memory}, id = 15069 > 02-03 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, > cumulative cost = {9455671.0 rows, 1.35531246E8 cpu, 0.0 io, 3.8730412032E10 > network, 0.0 memory}, id = 15068 > 03-01 LateralJoin(correlation=[$cor1], joinType=[inner], > requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType > = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = > 3151889.0, cumulative cost = {6303782.0 rows, 8.5101022E7 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 15067 > 03-03 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, > numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], > files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, > maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : > rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = > 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 15063 > 03-02 StreamAgg(group=[{}], totalprice=[SUM($0)]) : > rowType = RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 > rows, 19.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15066 > 03-04 Filter(condition=[OR(=($0, 89230.03), =($0, > 270087.44), =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, > 65277.06), =($0, 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = > 1.0, cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, > id = 15065 > 03-05 Project(ITEM=[ITEM($0, 'o_totalprice')]) : > rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, > 2.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15064 > 03-06 Unnest [srcOp=03-01] : rowType = > RecordType(ANY c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14860{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)