Jinfeng Ni created DRILL-1457:
---------------------------------
Summary: Limit operator optimization : push limit operator past
exchange operator; disable parallel plan if no order is required.
Key: DRILL-1457
URL: https://issues.apache.org/jira/browse/DRILL-1457
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
When there is LIMIT clause in a query, we would want to push down the LIMIT
operator as much as possible, so that the upstream operator will stop execution
once the desired number of rows are fetched.
Within one execution fragment, Drill applies a pull model. In many cases, there
would be no performance impact if LIMIT operator is not pushed down, since
LIMIT would inform the upstream operators to stop. However, in multiple
fragments, Drill use a push model. if LIMIT is not pushed past the exchange
operator, and the upstream fragment would continue the execution, until it
receives a notice from downstream fragment, even if LIMIT operator has already
got the required # of rows.
For instance:
explain plan for select * from
dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1;
+------------+------------+
| 00-00 Screen
00-01 SelectionVectorRemover
00-02 Limit(fetch=[1])
00-03 UnionExchange
01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]],
selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, columns=[SchemaPath
[`*`]]]])
The query profile shows Scan operator fetches much more records than desired:
Minor Fragment Start End Total Time Max Records Max Batches
01-00-xx 0.507 1.059 0.552 43688 8
01-01-xx 0.570 1.054 0.484 27305 5
01-02-xx 0.617 1.038 0.421 16383 3
01-03-xx 0.668 1.056 0.388 10922 2
01-04-xx 0.740 1.055 0.315 10922 2
01-05-xx 0.813 1.057 0.244 5461 1
In the above plan, there would be two choices for performance optimization:
1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN
operator.
2) Disable the parallel plan by removing EXCHANGE operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)