[ 
https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560482#comment-16560482
 ] 

ASF GitHub Bot commented on DRILL-6631:
---------------------------------------

Ben-Zvi commented on a change in pull request #1399: DRILL-6631: Streaming agg 
causes queries with Lateral and Unnest to r…
URL: https://github.com/apache/drill/pull/1399#discussion_r205921415
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/StreamingAggTemplate.java
 ##########
 @@ -40,7 +40,7 @@
 
   // First batch after build schema phase
   private boolean first = true;
 
 Review comment:
   Suggestion - rename *first* to something more meaningful - e.g.,  
*emitNotSeenYet* .
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Wrong result from LateralUnnest query with aggregation and order by
> -------------------------------------------------------------------
>
>                 Key: DRILL-6631
>                 URL: https://issues.apache.org/jira/browse/DRILL-6631
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>            Reporter: Parth Chandra
>            Assignee: Parth Chandra
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Reported by Chun:
> The following query gives correct result:
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> where customer.c_custkey = 101276;
> +------------+---------------------+-------------+
> | c_custkey  |       c_name        | totalprice  |
> +------------+---------------------+-------------+
> | 101276     | Customer#000101276  | 82657.72    |
> +------------+---------------------+-------------+
> 1 row selected (6.184 seconds)
> {noformat}
> But if I remove the where clause and replace it with order by and limit, I 
> got the following empty result set. This is wrong.
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> +------------+---------+-------------+
> | c_custkey  | c_name  | totalprice  |
> +------------+---------+-------------+
> +------------+---------+-------------+
> No rows selected (2.753 seconds)
> {noformat}
> Here is the plan for the query giving the correct result:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 472783.35, cumulative cost = {8242193.7349999985 
> rows, 4.102185433499999E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 
> memory}, id = 14410
> 00-01      Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
> cumulative cost = {8194915.3999999985 rows, 4.097457599999999E7 cpu, 0.0 io, 
> 5.809561804799999E9 network, 0.0 memory}, id = 14409
> 00-02        UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
> ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.049999999 
> rows, 3.9556225949999996E7 cpu, 0.0 io, 5.809561804799999E9 network, 0.0 
> memory}, id = 14408
> 01-01          LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 472783.35, cumulative cost = {7249348.699999999 rows, 3.577395915E7 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 14407
> 01-03            SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
> ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
> {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 14403
> 01-05              Filter(condition=[=($1, 101276)]) : rowType = 
> RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, 
> cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 14402
> 01-07                Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14401
> 01-02            StreamAgg(group=[{}], totalprice=[SUM($0)]) : rowType = 
> RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 rows, 19.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14406
> 01-04              Filter(condition=[OR(=($0, 89230.03), =($0, 270087.44), 
> =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 65277.06), =($0, 
> 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative 
> cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14405
> 01-06                Project(ITEM=[ITEM($0, 'o_totalprice')]) : rowType = 
> RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, 2.0 cpu, 
> 0.0 io, 0.0 network, 0.0 memory}, id = 14404
> 01-08                  Unnest [srcOp=01-01]  : rowType = RecordType(ANY 
> c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14240
> {noformat}
> And here is the plan for the query giving wrong result:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 50.0, cumulative cost = {2.2063382E7 rows, 
> 2.6342088596853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, id = 
> 15076
> 00-01      Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 50.0, 
> cumulative cost = {2.2063377E7 rows, 2.6342088096853146E8 cpu, 0.0 io, 
> 7.7460824064E10 network, 0.0 memory}, id = 15075
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY c_custkey, ANY 
> c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063327E7 
> rows, 2.6342073096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, 
> id = 15074
> 00-03          Limit(fetch=[50]) : rowType = RecordType(ANY c_custkey, ANY 
> c_name, ANY totalprice): rowcount = 50.0, cumulative cost = {2.2063277E7 
> rows, 2.6342068096853146E8 cpu, 0.0 io, 7.7460824064E10 network, 0.0 memory}, 
> id = 15073
> 00-04            SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {2.2063227E7 rows, 2.6342048096853146E8 cpu, 0.0 io, 7.7460824064E10 
> network, 0.0 memory}, id = 15072
> 01-01              OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {1.8911338E7 rows, 2.1299025696853146E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15071
> 02-01                SelectionVectorRemover : rowType = RecordType(ANY 
> c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost 
> = {1.5759449E7 rows, 2.0983836796853146E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15070
> 02-02                  TopN(limit=[50]) : rowType = RecordType(ANY c_custkey, 
> ANY c_name, ANY totalprice): rowcount = 3151889.0, cumulative cost = 
> {1.260756E7 rows, 2.0668647896853146E8 cpu, 0.0 io, 3.8730412032E10 network, 
> 0.0 memory}, id = 15069
> 02-03                    HashToRandomExchange(dist0=[[$0]]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 3151889.0, 
> cumulative cost = {9455671.0 rows, 1.35531246E8 cpu, 0.0 io, 3.8730412032E10 
> network, 0.0 memory}, id = 15068
> 03-01                      LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 3151889.0, cumulative cost = {6303782.0 rows, 8.5101022E7 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 15067
> 03-03                        Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 15063
> 03-02                        StreamAgg(group=[{}], totalprice=[SUM($0)]) : 
> rowType = RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 
> rows, 19.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15066
> 03-04                          Filter(condition=[OR(=($0, 89230.03), =($0, 
> 270087.44), =($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 
> 65277.06), =($0, 180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 
> 1.0, cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 15065
> 03-05                            Project(ITEM=[ITEM($0, 'o_totalprice')]) : 
> rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative cost = {2.0 rows, 
> 2.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15064
> 03-06                              Unnest [srcOp=03-01]  : rowType = 
> RecordType(ANY c_orders): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14860{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to