Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/12535 to look at the new patch set (#5). Change subject: IMPALA-8014: Revise FK/PK cardinality estimation ...................................................................... IMPALA-8014: Revise FK/PK cardinality estimation Impala uses two techniques to estimate join cardinality: one for FK/PK, another for the "generic" case. (IMPALA-8018 suggests we combine the two. But, for the purposes of this fix, we leave them separate.) The code for the FK/PK estimation is mostly right for a key that consists of a singe column. But, if a key is compound (contains more than one column), the calculations can produce incorrect results. This patch modifies the code to use the standard join calculation: |join| = |left| * |right| / max(|left key|, |right key|). The math for compound keys had a number of errors. Since no test tables represent a proper compound key, all tests use unrealistic keys such as a unique Id along with a second, superfluous column. To make the math come out OK for these, the code included adjustments that misfired on real-world keys. This patch fixes those errors. A separate patch introduces new tables with the required schema. Tests: Many planner tests were modified to reflect the new join cardinality estimate. Some plans changed, including some TPC-H plans. Inspected each change to verify that the numbers are correct using the formula cited above. Turns out that the change caused a spill test to fail. The new plan is more efficient and didn't require as much memory. Shrank the memory given to the query to force the newer, more efficient plan to also spill. Testing was also done using a production query that misfired without this fix, but that produced correct results with this fix. Change-Id: Id0db82564596b0a9271b89b8e3f7a3cf92d306da --- M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/card-multi-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test 17 files changed, 1,920 insertions(+), 1,829 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/12535/5 -- To view, visit http://gerrit.cloudera.org:8080/12535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0db82564596b0a9271b89b8e3f7a3cf92d306da Gerrit-Change-Number: 12535 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers <prog...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>