Steve Carlin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/23868


Change subject: IMPALA-14102: [part 4] Account for Runtime Filter application 
in cost model
......................................................................

IMPALA-14102: [part 4] Account for Runtime Filter application in cost model

This commit adds some basic code attempting to model runtime filters in the
cost model. This is a first pass and may need some additional work, but this
has been tested with a 3TB tpcds database and improved tpcds query performance
by roughly 5%.

The main code can be found in the ImpalaLoptOptimizeJoinHooks.getReduction()
method.

The code attempts to figure out the reduction percentage of rows that a runtime
filter will produce. It will only attempt to calculate the percentage if there
is an equality input condition with an input for the probe side compared to an
input on the build side. On the probe side, it uses Calcite's RelMetadataQuery
getDistinctCount to determine the number of distinct values at the join input
level for the given input. It divides this by the NDVs found at for the
RelColumnOrigin, which contains the table and column information. This
reduction percentage then gets applied on the build side.

This initial algorithm is kept fairly simple. This may be improved upon in
future commits.

Essentially, this code assumes that the joins are fairly simple, like connecting
a foreign key to a primary key, and assumes no skew. And it also assumes
the runtime filter will be applied at physical creation time.

A couple of additional complexity on the algorithm:
- If the join condition contains an "and" clause with multiple equality
  operators, the fields are assumed to be independent, as if it were a
  composite key, and the reduction percentages are multiplied.
- If reduction percentages come from 2 different joins, it does not
  assume independence and instead takes the lower of the 2 percentages.

The percentage reduction calculation can be turned off by either disabling
runtime filters or by setting the query option use_calcite_runtime_filter_stats
to false.

Unfortunately, the RelMetadataColumnOrigins code had to be copied from the 
Calcite
repository rather than extended because one method needed changing, but the
default constructor for the class was declared as package protected.

In this current iteration, the cost model actually is not calculated properly.
The reduction percentage is applied to the table scan level which is not right.
The amount of rows read does not change. Instead, the reduction should happen
a level above the table scan. However, the produced results seemed to be ok,
so this is a TODO for a later iteration.

Testing: Tpcds queries have been changed.  Also, TBD for additional testing.

Change-Id: I32abe5b02cb9d63bf226a4f4dfe3b458cf6b947d
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaLoptOptimizeJoinHooks.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaMQContext.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelColumnOrigin.java
A 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMdColumnOrigins.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMdNonCumulativeCost.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMetadataProvider.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java
M 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteSingleNodePlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q24b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q26.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q31.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q37.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q45.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q46.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q48.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q53.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q57.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q59.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q61.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q63.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q64.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q68.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q69.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q72.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q74.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q75.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q76.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q78.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q82.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q83.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q85.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q89.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q91.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q95.test
52 files changed, 24,182 insertions(+), 23,394 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/23868/3
--
To view, visit http://gerrit.cloudera.org:8080/23868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32abe5b02cb9d63bf226a4f4dfe3b458cf6b947d
Gerrit-Change-Number: 23868
Gerrit-PatchSet: 3
Gerrit-Owner: Steve Carlin <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to