Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17387

to look at the new patch set (#4).

Change subject: IMPALA-10681: Improve inner join cardinality estimates
......................................................................

IMPALA-10681: Improve inner join cardinality estimates

During cardinality estimation for inner joins, if the join
conjunct involves a scan slot on left side and a function
(e.g MAX) on the right, currently we determine that the NDV
stats of either side is not useful and return the left side's
cardinality even though it may be a significant over-estimate.

In this patch, we handle join conjuncts of such types by
keeping them in an 'other' eligible conjuncts list as long as
the NDV for expressions on both sides of the join and the
input row count is available. For example, in the following
cases the NDV is available but was not being used for inner
joins since the previous logic was only looking for scan
slots: (a) int_col = MAX(int_col) and the right input does
not have a group-by, so right NDV = 1 can be used. (b) if it
has a group-by and the group-by columns already have
associated NDV, the combined NDV is also available.
Other such examples exist. An auxiliary struct is introduced
to keep track of the ndv and row count.

Once these 'other' eligible conjuncts are populated, we do the
join cardinality estimation in a manner similar to the normal
join conjuncts by fetching the stats from the auxiliary struct.

Testing:
 - Added new planner tests for inner join cardinality
 - Modified expected plans for certains tests including
   TPC-DS queries and ran end-to-end TPC-DS queries
 - Since TPC-DS plans are complex, I did a check of the cardinality
   changes for some of the hash joins but not the changes in the
   shape of a plan (e.g whether the join order changed).
 - Preliminary tests with a TPC-DS 10 GB scale factor on a single
   node showed between 8-15% performance improvements for 5 of the
   6 queries whose plans changed.

Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
15 files changed, 3,691 insertions(+), 3,331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/4
--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to