[jira] [Commented] (DRILL-6574) Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

ASF GitHub Bot (JIRA) Thu, 19 Jul 2018 01:08:07 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548964#comment-16548964
 ]


ASF GitHub Bot commented on DRILL-6574:
---------------------------------------

KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to 
push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203440521
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##########
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
     return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
+   * any of the transformations, but currently this can be applied after Drill 
logical transformation, and before
+   * Drill physical transformation.
+   */
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+    final Pointer<Boolean> isUnsupported = new Pointer<>(false);
 
 Review comment:
   Because the variable is re-assigned and is used in anonymous classes 
(unsupportedFunctionsVisitor, unsupportedOperationsVisitor variables) and 
should be (effectively) final.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6574
>                 URL: https://issues.apache.org/jira/browse/DRILL-6574
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.14.0
>
>
> Currently we have early limit 0 optimization 
> (planner.enable_limit0_optimization) which determines query data types before 
> actual scan. Since we not always able to determine data type during planning, 
> we need to add one more option to enable late limit 0 optimization 
> (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN 
> for UNION and complex functions will be disabled i.e. UNION and complex 
> functions need data to produce result schema. This would not work for the 
> following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, 
> CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
> Query plan examples:
> For query
> {code:java}
> SELECT * FROM (
>   SELECT l.l_quantity, l.l_shipdate, o.o_custkey 
>   FROM cp.`tpch/lineitem.parquet` l 
>   JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey 
>   LIMIT 2) 
> LIMIT 0
> {code}
> {color:#6a8759}plan after changes looks like{color}
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 
> 0.0 io, 0.0 network, 17.6 memory}, id = 527
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 526
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 525
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 524
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 523
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 
> 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
> 00-07                SelectionVectorRemover : rowType = RecordType(ANY 
> l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost 
> = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
> 00-09                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, 
> cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 517
> 00-11                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 516
> 00-06                SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 
> 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
> 00-08                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = 
> \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
> 00-10                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
> {noformat}
> and before changes:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 
> 0.0 io, 0.0 network, 264000.0 memory}, id = 452
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 451
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 450
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 449
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 448
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 
> rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
> 00-07                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 445
> 00-06                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> {noformat}
> Also both early and late limit 0 optimizations will be enabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6574) Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

Reply via email to