Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@306
PS3, Line 306: qJoinConjunct: eqJoinCo
> nit: could you mention this calculation in the doc comment of this method?
Modified the method comments to add this 3rd estimation method.  It is actually 
using the same underlying logic as the genericJoinCardinality estimator except 
for supplying the ndv and num rows stats explicitly.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@430
PS3, Line 430: lt;
> getNdvAdjustedJoinCardinality() may sound more close in semanticds of this
Actually, all the join cardinality methods in this file are using NDV so I am 
not sure that we can associate this method name with NDV as it is not a 
differentiator. I couldn't come up with a more specific name. If you have other 
suggestions let me know :)


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@441
PS3, Line 441:                                    long lhsCard, long rhsCard) {
> I wonder if we should trust lhsCard more here. On paper, NDV of an aggregat
Note that I kept the calculation the same as getGenericJoinCardinality() . The 
calculation formula is agnostic to how the stats were created. I may have 
misunderstood your comment.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@448
PS3, Line 448: // Adjust the NDVs on both sides to account for predicates. 
Intuitively, the NDVs
             :       // should onl
> For equi- inner joins and the max. cardinality is to be returned, it seems
The formula on line 448 is the same as the one in getGenericJoinCardinality()  
line 411 which is the same as the standard  estimation used in several db 
engines. I don't see a good reason to change the core formula. Let me know if 
you think otherwise.



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 May 2021 06:51:36 +0000
Gerrit-HasComments: Yes

Reply via email to