Hello Shant Hovsepian, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16723 to look at the new patch set (#3). Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... IMPALA-10314: Optimize planning time for simple limits This patch optimizes the planning time for simple limit queries by only considering a minimal set of partitions whose file descriptors add up to N (the specified limit). Each file is conservatively estimated to contain 1 row. This reduces the number of partitions processed by HdfsScanNode.computeScanRangeLocations() which, according to query profiling, has been the main contributor to the planning time especially for large number of partitions. Further, within each partition, we only consider the number of files that brings the total to N. This is an opt-in optimization. A new planner option OPTIMIZE_SIMPLE_LIMIT enables this optimization. If enabled, in certain cases the query may produce fewer rows (due to filter conditions or presence of empty files) although those rows are valid rows that would have been present if the optimization was disabled. Testing: - Added planner tests for the optimization - Ran query_test.py tests by enabling the optimize_simple_limit - Added e2e tests. Since result rows are non-deterministic, only simple count(*) queries on top of subqueries with limit were added. Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 13 files changed, 365 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/3 -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>