Steve Carlin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/23868
Change subject: IMPALA-14102: [part 4] Account for Runtime Filter application in cost model ...................................................................... IMPALA-14102: [part 4] Account for Runtime Filter application in cost model This commit adds some basic code attempting to model runtime filters in the cost model. This is a first pass and may need some additional work, but this has been tested with a 3TB tpcds database and improved tpcds query performance by roughly 5%. The main code can be found in the ImpalaLoptOptimizeJoinHooks.getReduction() method. The code attempts to figure out the reduction percentage of rows that a runtime filter will produce. It will only attempt to calculate the percentage if there is an equality input condition with an input for the probe side compared to an input on the build side. On the probe side, it uses Calcite's RelMetadataQuery getDistinctCount to determine the number of distinct values at the join input level for the given input. It divides this by the NDVs found at for the RelColumnOrigin, which contains the table and column information. This reduction percentage then gets applied on the build side. This initial algorithm is kept fairly simple. This may be improved upon in future commits. Essentially, this code assumes that the joins are fairly simple, like connecting a foreign key to a primary key, and assumes no skew. And it also assumes the runtime filter will be applied at physical creation time. A couple of additional complexity on the algorithm: - If the join condition contains an "and" clause with multiple equality operators, the fields are assumed to be independent, as if it were a composite key, and the reduction percentages are multiplied. - If reduction percentages come from 2 different joins, it does not assume independence and instead takes the lower of the 2 percentages. The percentage reduction calculation can be turned off by either disabling runtime filters or by setting the query option use_calcite_runtime_filter_stats to false. Unfortunately, the RelMetadataColumnOrigins code had to be copied from the Calcite repository rather than extended because one method needed changing, but the default constructor for the class was declared as package protected. In this current iteration, the cost model actually is not calculated properly. The reduction percentage is applied to the table scan level which is not right. The amount of rows read does not change. Instead, the reduction should happen a level above the table scan. However, the produced results seemed to be ok, so this is a TODO for a later iteration. Testing: Tpcds queries have been changed. Also, TBD for additional testing. Change-Id: I32abe5b02cb9d63bf226a4f4dfe3b458cf6b947d --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaLoptOptimizeJoinHooks.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/rules/ImpalaMQContext.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelColumnOrigin.java A java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMdColumnOrigins.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMdNonCumulativeCost.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaRelMetadataProvider.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteOptimizer.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteRelNodeConverter.java M java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteSingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q07.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q17.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q23a.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q23b.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q26.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q45.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q46.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q53.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q57.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q59.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q63.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q64.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q68.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q69.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q72.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q76.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q78.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q83.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q89.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/calcite_tpcds/tpcds-q95.test 52 files changed, 24,182 insertions(+), 23,394 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/23868/3 -- To view, visit http://gerrit.cloudera.org:8080/23868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I32abe5b02cb9d63bf226a4f4dfe3b458cf6b947d Gerrit-Change-Number: 23868 Gerrit-PatchSet: 3 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]>
