Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17387

to look at the new patch set (#3).

Change subject: IMPALA-10681: Improve join cardinality estimates
......................................................................

IMPALA-10681: Improve join cardinality estimates

During cardinality estimation for inner joins, if the join
conjunct involves a scan slot on left side and a function
(e.g MAX) on the right, currently we determine that the NDV
stats of either side is not useful and return the left side's
cardinality even though it may be a significant over-estimate.

In this patch, we handle join conjuncts of such types by
keeping them in an 'other' eligible conjuncts list as long as
the NDV for expressions on both sides of the join can be
reasonably estimated and the input cardinality is also available.
For example, if the conjunct is int_col = MAX(int_col) and the
right input does not have a group-by, the right NDV = 1 and
can be safely used. If it has a group-by and the group-by
columns alread have associated NDV, we can can still know the
combined NDV. Other such examples exist. An auxiliary struct is
introduced to keep track of the ndv and row count.

Once these 'other' eligible conjuncts are populated, we do the
join cardinality estimation in a manner similar to the normal
join conjuncts by fetching the stats from the auxiliary struct.

Testing:
 - Added new planner tests for inner join cardinality
 - Modified expected plans for certains tests including
   TPC-DS queries and ran end-to-end TPC-DS queries
 - Since TPC-DS plans are complex, I did a check of the cardinality
   changes for some of the hash joins but not the changes in the
   shape of a plan (e.g whether the join order changed).

   TODO: We would want to run a performance test to validate
   the plan changes for TPC-DS at a sufficiently high scale factor.

Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
15 files changed, 3,681 insertions(+), 3,331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/3
--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to