[ https://issues.apache.org/jira/browse/DRILL-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman closed DRILL-1457. ----------------------------------- > Limit operator optimization : push limit operator past exchange operator > ------------------------------------------------------------------------ > > Key: DRILL-1457 > URL: https://issues.apache.org/jira/browse/DRILL-1457 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Priority: Critical > Labels: no_verified_test > Fix For: 1.2.0 > > Attachments: > 0001-DRILL-1457-Push-Limit-past-through-UnionExchange.patch > > > When there is LIMIT clause in a query, we would want to push down the LIMIT > operator as much as possible, so that the upstream operator will stop > execution once the desired number of rows are fetched. > Within one execution fragment, Drill applies a pull model. In many cases, > there would be no performance impact if LIMIT operator is not pushed down, > since LIMIT would inform the upstream operators to stop. However, in multiple > fragments, Drill use a push model. if LIMIT is not pushed past the exchange > operator, and the upstream fragment would continue the execution, until it > receives a notice from downstream fragment, even if LIMIT operator has > already got the required # of rows. > For instance: > explain plan for select * from > dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1; > +------------+------------+ > | 00-00 Screen > 00-01 SelectionVectorRemover > 00-02 Limit(fetch=[1]) > 00-03 UnionExchange > 01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]], > selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, > columns=[SchemaPath [`*`]]]]) > The query profile shows Scan operator fetches much more records than desired: > Minor Fragment Start End Total Time Max Records Max > Batches > 01-00-xx 0.507 1.059 0.552 43688 8 > 01-01-xx 0.570 1.054 0.484 27305 5 > 01-02-xx 0.617 1.038 0.421 16383 3 > 01-03-xx 0.668 1.056 0.388 10922 2 > 01-04-xx 0.740 1.055 0.315 10922 2 > 01-05-xx 0.813 1.057 0.244 5461 1 > In the above plan, there would be two choices for performance optimization: > 1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN > operator. > 2) Disable the parallel plan by removing EXCHANGE operator. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)