Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21377 )

Change subject: IMPALA-8042: Assign BETWEEN selectivity for discrete-unique 
column
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21377/2//COMMIT_MSG@18
PS2, Line 18: This patch adds a narrow optimization of BetweenPredicate 
selectivity
> If the user's query has  a predicate unique_col >=5 AND unique_col <= 10 in
Correct. This is mainly because the analysis happen at 
BetweenToCompoundRule.java.
A more general approach will require analyzing all range BinaryPredicate and 
quickly becomes complicated when they are overlaps with each other.
I leave TODO at PlanNode.java to think about a more general approach to this 
problem.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2626
PS2, Line 2626:
> The 'prioritization' part was not clear.. why exactly is it needed ? .. cou
Added comment.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1856
PS2, Line 1856:    */
> nit: in the comments above, could you add a note about the acceptDate param
Done


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@759
PS2, Line 759:    *    lower selectivity if analyzed as a pair.
> nit: this comment needs updating to reflect the between selectivity.
Added as issue number 3.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/planner/PlanNode.java@774
PS2, Line 774:
> nit: have 'been' assigned ..
Done


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@104
PS2, Line 104: hasNumDist
> Looking at the hasStats() method:
It is enough to check for hasNumDistinctValues() only. Updated this code 
accordingly.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@107
PS2, Line 107: numNotNulls
> Is there any test that covers the case where the null count made a differen
Added tests against functional_parquet.unique_with_nulls.


http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java:

http://gerrit.cloudera.org:8080/#/c/21377/5/fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java@127
PS5, Line 127: double sel = Math.max(0.0, (double) diff / 
stats.getNumDistinctValues());
I think test against unique_with_nulls returns wrong cardinality estimate. 
Should be half of what it is now.
I'll check this line again.


http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/21377/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@544
PS2, Line 544:    * something like 0.33^2.
> nit: this last sentence should be updated for the new behavior.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10
Gerrit-Change-Number: 21377
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 May 2024 00:44:15 +0000
Gerrit-HasComments: Yes

Reply via email to