[ https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552792#comment-16552792 ]
Arina Ielchiieva commented on DRILL-6574: ----------------------------------------- Merged into master with commits id: aa6a898a596ce9a8e2677ebc4987947bffc56dcb 5f4eeb8943e0e009136fcb5e61b51931280dac3f > Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization) > ---------------------------------------------------------------------- > > Key: DRILL-6574 > URL: https://issues.apache.org/jira/browse/DRILL-6574 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.13.0 > Reporter: Bohdan Kazydub > Assignee: Bohdan Kazydub > Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > > Currently we have early limit 0 optimization > (planner.enable_limit0_optimization) which determines query data types before > actual scan. Since we not always able to determine data type during planning, > we need to add one more option to enable late limit 0 optimization > (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN > for UNION and complex functions will be disabled i.e. UNION and complex > functions need data to produce result schema. This would not work for the > following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, > CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON. > Query plan examples: > For query > {code:java} > SELECT * FROM ( > SELECT l.l_quantity, l.l_shipdate, o.o_custkey > FROM cp.`tpch/lineitem.parquet` l > JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey > LIMIT 2) > LIMIT 0 > {code} > {color:#6a8759}plan after changes looks like{color} > {noformat} > 00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY > o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, > 0.0 io, 0.0 network, 17.6 memory}, id = 527 > 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : > rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount > = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, > 17.6 memory}, id = 526 > 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, > ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = > 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, > 17.6 memory}, id = 525 > 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY > l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, > cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 > memory}, id = 524 > 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY > l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, > cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 > memory}, id = 523 > 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) : > rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY > o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, > 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522 > 00-07 SelectionVectorRemover : rowType = RecordType(ANY > l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost > = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518 > 00-09 Limit(offset=[0], fetch=[0]) : rowType = > RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, > cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 > memory}, id = 517 > 00-11 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) > : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): > rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, > 0.0 network, 0.0 memory}, id = 516 > 00-06 SelectionVectorRemover : rowType = RecordType(ANY > o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, > 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521 > 00-08 Limit(offset=[0], fetch=[0]) : rowType = > RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = > \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520 > 00-10 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = > RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative > cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519 > {noformat} > and before changes: > {noformat} > 00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY > o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, > 0.0 io, 0.0 network, 264000.0 memory}, id = 452 > 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : > rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount > = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, > 264000.0 memory}, id = 451 > 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, > ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = > 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, > 264000.0 memory}, id = 450 > 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY > l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, > cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, > 264000.0 memory}, id = 449 > 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY > l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, > cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, > 264000.0 memory}, id = 448 > 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) : > rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY > o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 > rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447 > 00-07 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) > : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): > rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, > 0.0 network, 0.0 memory}, id = 445 > 00-06 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = > RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative > cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446 > {noformat} > Also both early and late limit 0 optimizations will be enabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)