[ 
https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6631:
------------------------------------

    Assignee: Parth Chandra

> Wrong result from LateralUnnest query with aggregation and order by
> -------------------------------------------------------------------
>
>                 Key: DRILL-6631
>                 URL: https://issues.apache.org/jira/browse/DRILL-6631
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>            Reporter: Parth Chandra
>            Assignee: Parth Chandra
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Reported by Chun:
> The following query gives correct result:
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> where customer.c_custkey = 101276;
> +------------+---------------------+-------------+
> | c_custkey  |       c_name        | totalprice  |
> +------------+---------------------+-------------+
> | 101276     | Customer#000101276  | 82657.72    |
> +------------+---------------------+-------------+
> 1 row selected (6.184 seconds)
> {noformat}
> But if I remove the where clause and replace it with order by and limit, I 
> got the following empty result set. This is wrong.
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> +------------+---------+-------------+
> | c_custkey  | c_name  | totalprice  |
> +------------+---------+-------------+
> +------------+---------+-------------+
> No rows selected (2.753 seconds)
> {noformat}
> Here is the plan for the query giving the correct result:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 472783.35, cumulative cost = {8242193.7349999985 
> rows, 4.102185433499999E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 
> memory}, id = 14410
> 00-01      Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
> cumulative cost = {8194915.3999999985 rows, 4.097457599999999E7 cpu, 0.0 io, 
> 5.809561804799999E9 network, 0.0 memory}, id = 14409
> 00-02        UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
> ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.049999999 
> rows, 3.9556225949999996E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 
> memory}, id = 14408
> 01-01          LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 472783.35, cumulative cost = {7249348.699999999 rows, 3.577395915E7 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 14407
> 01-03            SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
> ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
> {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 14403
> 01-05              Filter(condition=[=($1, 101276)]) : rowType = 
> RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, 
> cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 14402
> 01-07                Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14401
> 01-02            StreamAgg(group=[{}], totalprice=[SUM($0)]) : rowType = 
> RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 rows, 19.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14406
> 01-04              Filter(condition=[OR(=($0, 89230.03), =($0, 270087.44), 
> =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 65277.06), =($0, 
> 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative 
> cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14405
> 01-06                Project(ITEM=[ITEM($0, 'o_totalprice')]) : rowType = 
> RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, 2.0 cpu, 
> 0.0 io, 0.0 network, 0.0 memory}, id = 14404
> 01-08                  Unnest [srcOp=01-01]  : rowType = RecordType(ANY 
> c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14240
> {noformat}
> And here is the plan for the query giving wrong result:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 50.0, cumulative cost = {2.2063382E7 rows, 
> 2.6342088596853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, id = 
> 15076
> 00-01      Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 50.0, 
> cumulative cost = {2.2063377E7 rows, 2.6342088096853146E8 cpu, 0.0 io, 
> 7.7460824064E10 network, 0.0 memory}, id = 15075
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY c_custkey, ANY 
> c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063327E7 
> rows, 2.6342073096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, 
> id = 15074
> 00-03          Limit(fetch=[50]) : rowType = RecordType(ANY c_custkey, ANY 
> c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063277E7 
> rows, 2.6342068096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, 
> id = 15073
> 00-04            SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {2.2063227E7 rows, 2.6342048096853146E8 cpu, 0.0 io, 7.7460824064E10 
> network, 0.0 memory}, id = 15072
> 01-01              OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {1.8911338E7 rows, 2.1299025696853146E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15071
> 02-01                SelectionVectorRemover : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {1.5759449E7 rows, 2.0983836796853146E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15070
> 02-02                  TopN(limit=[50]) : rowType = RecordType(ANY c_custkey, 
> ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost = 
> {1.260756E7 rows, 2.0668647896853146E8 cpu, 0.0 io, 3.8730412032E10 network, 
> 0.0 memory}, id = 15069
> 02-03                    HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, 
> cumulative cost = {9455671.0 rows, 1.35531246E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15068
> 03-01                      LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 3151889.0, cumulative cost = {6303782.0 rows, 8.5101022E7 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 15067
> 03-03                        Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 15063
> 03-02                        StreamAgg(group=[{}], totalprice=[SUM($0)]) : 
> rowType = RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 
> rows, 19.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15066
> 03-04                          Filter(condition=[OR(=($0, 89230.03), =($0, 
> 270087.44), =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 
> 65277.06), =($0, 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 
> 1.0, cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 15065
> 03-05                            Project(ITEM=[ITEM($0, 'o_totalprice')]) : 
> rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, 
> 2.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15064
> 03-06                              Unnest [srcOp=03-01]  : rowType = 
> RecordType(ANY c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14860{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to