Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19682 )

Change subject: IMPALA-12006: Improve cardinality estimation for joins 
involving multiple conjuncts
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/19682/6/fe/src/main/java/org/apache/impala/planner/JoinNode.java@496
PS6, Line 496:       if (corrfactor > 0) cumulativeSel *= (((double) 
joinCard/lhsCard)/rhsCard);
> This would be more readable as (double) joinCard/(lhsCard*rhsCard);
Not doing (lhsCard * rhsCard) was intentional here and and other places.  This 
was to avoid multiplication overflow for large values since both are long 
datatype.  In a prior version of the patch I was doing a checkedMultiply() 
which checks for overflow but changed it to handle the lhsCard and rhsCard 
separately based on Csaba's comment.

Regarding applying the correlation factor, I had a slightly different 
interpretation of this.
If you have 4 equijoin conjuncts  c1, c2, c3, c4 and each one's selectivity is 
s1, s2, s3, s4,  the correlation factor is not necessarily representing a 
pair-wise correlation.  If that was the case, we would get (s1 * s2)/CF for the 
first two then considering s3, we would get. (((s1 * s2)/CF) * s3)/CF  and so 
on.. but c1 and c2 may not be correlated to each other at all. Only one of them 
might be correlated to c3.   Having a single denominator allows us to set it to 
whatever factor we want to represent whether 2 or 3 or all 4 are correlated. It 
also makes it easier to articulate in the documentation.



--
To view, visit http://gerrit.cloudera.org:8080/19682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I845d778a58404af834f7501fc8157a5a4b4bcc35
Gerrit-Change-Number: 19682
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Mon, 10 Apr 2023 23:37:40 +0000
Gerrit-HasComments: Yes

Reply via email to