Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17387 to look at the new patch set (#7). Change subject: IMPALA-10681: Improve inner join cardinality estimates ...................................................................... IMPALA-10681: Improve inner join cardinality estimates During cardinality estimation for inner joins, if the join conjunct involves a scan slot on left side and a function (e.g MAX) on the right, currently we determine that the NDV stats of either side is not useful and return the left side's cardinality even though it may be a significant over-estimate. In this patch, we handle join conjuncts of such types by keeping them in an 'other' eligible conjuncts list as long as the NDV for expressions on both sides of the join and the input row count is available. For example, in the following cases the NDV is available but was not being used for inner joins since the previous logic was only looking for scan slots: (a) int_col = MAX(int_col) and the right input does not have a group-by, so right NDV = 1 can be used. (b) if it has a group-by and the group-by columns already have associated NDV, the combined NDV is also available. Other such examples exist. An auxiliary struct is introduced to keep track of the ndv and row count. Once these 'other' eligible conjuncts are populated, we do the join cardinality estimation in a manner similar to the normal join conjuncts by fetching the stats from the auxiliary struct. Testing: - Added new planner tests for inner join cardinality - Modified expected plans for certains tests including TPC-DS queries and ran end-to-end TPC-DS queries - Since TPC-DS plans are complex, I did a check of the cardinality changes for some of the hash joins but not the changes in the shape of a plan (e.g whether the join order changed). - Preliminary tests with a TPC-DS 10 GB scale factor on a single node showed between 5-15% performance improvements for 4 of the 6 queries whose plans changed. Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 15 files changed, 3,719 insertions(+), 3,349 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/7 -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 7 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>