[ 
https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6574:
----------------------------------
    Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6574
>                 URL: https://issues.apache.org/jira/browse/DRILL-6574
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>              Labels: doc-complete, ready-to-commit
>             Fix For: 1.14.0
>
>
> Currently we have early limit 0 optimization 
> (planner.enable_limit0_optimization) which determines query data types before 
> actual scan. Since we not always able to determine data type during planning, 
> we need to add one more option to enable late limit 0 optimization 
> (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN 
> for UNION and complex functions will be disabled i.e. UNION and complex 
> functions need data to produce result schema. This would not work for the 
> following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, 
> CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
> Query plan examples:
> For query
> {code:java}
> SELECT * FROM (
>   SELECT l.l_quantity, l.l_shipdate, o.o_custkey 
>   FROM cp.`tpch/lineitem.parquet` l 
>   JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey 
>   LIMIT 2) 
> LIMIT 0
> {code}
> {color:#6a8759}plan after changes looks like{color}
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 
> 0.0 io, 0.0 network, 17.6 memory}, id = 527
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 526
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 525
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 524
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 523
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 
> 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
> 00-07                SelectionVectorRemover : rowType = RecordType(ANY 
> l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost 
> = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
> 00-09                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, 
> cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 517
> 00-11                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 516
> 00-06                SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 
> 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
> 00-08                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = 
> \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
> 00-10                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
> {noformat}
> and before changes:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 
> 0.0 io, 0.0 network, 264000.0 memory}, id = 452
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 451
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 450
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 449
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 448
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 
> rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
> 00-07                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 445
> 00-06                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> {noformat}
> Also both early and late limit 0 optimizations will be enabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to