Hello Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16723

to look at the new patch set (#3).

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................

IMPALA-10314: Optimize planning time for simple limits

This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. If enabled,
in certain cases the query may produce fewer rows (due to
filter conditions or presence of empty files) although
those rows are valid rows that would have been present if
the optimization was disabled.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added e2e tests. Since result rows are non-deterministic,
   only simple count(*) queries on top of subqueries with limit
   were added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
13 files changed, 365 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/3
--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to