Alex Behm has uploaded a new patch set (#5). Change subject: IMPALA-5547: Rework FK/PK join detection. ......................................................................
IMPALA-5547: Rework FK/PK join detection. Reworks the FK/PK join detection logic to: - more accurately recognize many-to-many joins - avoid dim/dim joins for multi-column PKs The new detection logic maintains our existing philosophy of generally assuming a FK/PK join, unless there is strong evidence to the contrary, as follows. For each set of simple equi-join conjuncts between two tables, we compute the joint NDV of the right-hand side columns by multiplication, and if the joint NDV is significantly smaller than the right-hand side row count, then we are fairly confident that the right-hand side is not a PK. Otherwise, we assume the set of conjuncts could represent a FK/PK relationship. Extends the explain plan to include the outcome of the FK/PK detection at EXPLAIN_LEVEL > STANDARD. Performance testing: 1. Full TPC-DS run on 10TB: - Q10 improved by >100x - Q72 improved by >25x - Q17,Q26,Q29 improved by 2x - Q64 regressed by 10x - Total runtime: Improved by 2x - Geomean: Minor improvement The regression of Q64 is understood and we will try to address it in follow-on changes. The previous plan was better by accident and not because of superior logic. 2. Nightly TPC-H and TPC-DS runs: - No perf differences Testing: - The existing planner test cover the changes. - Code/hdfs run passed. Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 --- M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test 18 files changed, 1,171 insertions(+), 588 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/7257/5 -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>