Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 )
Change subject: IMPALA-10681: Improve inner join cardinality estimates ...................................................................... Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@306 PS3, Line 306: qJoinConjunct: eqJoinCo > nit: could you mention this calculation in the doc comment of this method? Modified the method comments to add this 3rd estimation method. It is actually using the same underlying logic as the genericJoinCardinality estimator except for supplying the ndv and num rows stats explicitly. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@430 PS3, Line 430: lt; > getNdvAdjustedJoinCardinality() may sound more close in semanticds of this Actually, all the join cardinality methods in this file are using NDV so I am not sure that we can associate this method name with NDV as it is not a differentiator. I couldn't come up with a more specific name. If you have other suggestions let me know :) http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@441 PS3, Line 441: long lhsCard, long rhsCard) { > I wonder if we should trust lhsCard more here. On paper, NDV of an aggregat Note that I kept the calculation the same as getGenericJoinCardinality() . The calculation formula is agnostic to how the stats were created. I may have misunderstood your comment. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@448 PS3, Line 448: // Adjust the NDVs on both sides to account for predicates. Intuitively, the NDVs : // should onl > For equi- inner joins and the max. cardinality is to be returned, it seems The formula on line 448 is the same as the one in getGenericJoinCardinality() line 411 which is the same as the standard estimation used in several db engines. I don't see a good reason to change the core formula. Let me know if you think otherwise. -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 6 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 20 May 2021 06:51:36 +0000 Gerrit-HasComments: Yes