[ 
https://issues.apache.org/jira/browse/DRILL-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908214#comment-14908214
 ] 

ASF GitHub Bot commented on DRILL-1457:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/169#discussion_r40444588
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillLimitRelBase.java
 ---
    @@ -36,11 +36,17 @@
     public abstract class DrillLimitRelBase extends SingleRel implements 
DrillRelNode {
       protected RexNode offset;
       protected RexNode fetch;
    +  private boolean pushDown;  // whether limit has been push past its child.
    --- End diff --
    
    LIMIT is somewhat unique operator in that even when it is pushed past the 
child, the original LIMIT still remains.  Can you add that in the comments 
because the pushDown flag will be TRUE for the original LIMIT and FALSE for the 
pushed version. 


> Limit operator optimization : push limit operator past exchange operator; 
> disable parallel plan if no order is required.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1457
>                 URL: https://issues.apache.org/jira/browse/DRILL-1457
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>            Priority: Critical
>             Fix For: 1.2.0
>
>         Attachments: 
> 0001-DRILL-1457-Push-Limit-past-through-UnionExchange.patch
>
>
> When there is LIMIT clause in a query, we would want to push down the LIMIT 
> operator as much as possible, so that the upstream operator will stop 
> execution once the desired number of rows are fetched.
> Within one execution fragment, Drill applies a pull model. In many cases, 
> there would be no performance impact if LIMIT operator is not pushed down, 
> since LIMIT would inform the upstream operators to stop. However, in multiple 
> fragments, Drill use a push model.  if LIMIT is not pushed past the exchange 
> operator, and the upstream fragment would continue the execution, until it 
> receives a notice from downstream fragment, even if LIMIT operator has 
> already got the required # of rows.
> For instance:
> explain plan for select * from 
> dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1;
> +------------+------------+
> | 00-00    Screen
> 00-01      SelectionVectorRemover
> 00-02        Limit(fetch=[1])
> 00-03          UnionExchange
> 01-01            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]], 
> selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, 
> columns=[SchemaPath [`*`]]]])
> The query profile shows Scan operator fetches much more records than desired:
> Minor Fragment        Start   End     Total Time      Max Records     Max 
> Batches
> 01-00-xx      0.507   1.059   0.552   43688   8
> 01-01-xx      0.570   1.054   0.484   27305   5
> 01-02-xx      0.617   1.038   0.421   16383   3
> 01-03-xx      0.668   1.056   0.388   10922   2
> 01-04-xx      0.740   1.055   0.315   10922   2
> 01-05-xx      0.813   1.057   0.244   5461    1
> In the above plan,  there would be two choices for performance optimization:
> 1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN 
> operator. 
> 2) Disable the parallel plan by removing EXCHANGE operator.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to