[ 
https://issues.apache.org/jira/browse/DRILL-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943540#comment-14943540
 ] 

Victoria Markman commented on DRILL-1457:
-----------------------------------------

I'm removing "disable parallel plan if no order is required" from the title: 
this is not what was implemented and is a bit misleading.

> Limit operator optimization : push limit operator past exchange operator; 
> disable parallel plan if no order is required.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1457
>                 URL: https://issues.apache.org/jira/browse/DRILL-1457
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>            Priority: Critical
>              Labels: no_verified_test
>             Fix For: 1.2.0
>
>         Attachments: 
> 0001-DRILL-1457-Push-Limit-past-through-UnionExchange.patch
>
>
> When there is LIMIT clause in a query, we would want to push down the LIMIT 
> operator as much as possible, so that the upstream operator will stop 
> execution once the desired number of rows are fetched.
> Within one execution fragment, Drill applies a pull model. In many cases, 
> there would be no performance impact if LIMIT operator is not pushed down, 
> since LIMIT would inform the upstream operators to stop. However, in multiple 
> fragments, Drill use a push model.  if LIMIT is not pushed past the exchange 
> operator, and the upstream fragment would continue the execution, until it 
> receives a notice from downstream fragment, even if LIMIT operator has 
> already got the required # of rows.
> For instance:
> explain plan for select * from 
> dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1;
> +------------+------------+
> | 00-00    Screen
> 00-01      SelectionVectorRemover
> 00-02        Limit(fetch=[1])
> 00-03          UnionExchange
> 01-01            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]], 
> selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, 
> columns=[SchemaPath [`*`]]]])
> The query profile shows Scan operator fetches much more records than desired:
> Minor Fragment        Start   End     Total Time      Max Records     Max 
> Batches
> 01-00-xx      0.507   1.059   0.552   43688   8
> 01-01-xx      0.570   1.054   0.484   27305   5
> 01-02-xx      0.617   1.038   0.421   16383   3
> 01-03-xx      0.668   1.056   0.388   10922   2
> 01-04-xx      0.740   1.055   0.315   10922   2
> 01-05-xx      0.813   1.057   0.244   5461    1
> In the above plan,  there would be two choices for performance optimization:
> 1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN 
> operator. 
> 2) Disable the parallel plan by removing EXCHANGE operator.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to