wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities
......................................................................


Patch Set 4:

(30 comments)

Hi guys, thanks for your carefully review. I've already modify code as much as 
possible. If I missed any thing or you have any suggestions, please tell me. 
Hope for your reply again.

http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@9
PS3, Line 9: use
> nit uses
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@10
PS3, Line 10: orse query plan
> nit: this may lead
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@10
PS3, Line 10: , and this may lea
> nit. remove?
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@11
PS3, Line 11: be in the future,
            : we can use histograms to get more precise qu
> nit. to reduce such errors.
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@20
PS3, Line 20:
> nit remove
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@21
PS3, Line 21:
> nit is valid
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@22
PS3, Line 22: For 'SELECTIVITY' hint, we can use in these predicates:
> I think we should raise a warning when the hint is not used.
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@24
PS3, Line 24: * InPredicate
> nit. types of
Sorry, I don't understand, can you explain this?


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@24
PS3, Line 24:
> in non-compounding form.
Sorry, I don't understand, can you explain this?


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@34
PS3, Line 34:
> nit: formats
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@45
PS3, Line 45: ic
> isn't second 0.1?
I mean, these predicates without selectivity computing, default value is -1. 
E.g 'a>1 and b>2', these two predicates' selectivity value are both '-1'. When 
compute combined selectivity in PlanNode.computeCombinedSelectivity(), Impala 
will use only one 0.1 for all these predicates, which selectivity value are -1, 
which mean no selectiviy.


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@48
PS3, Line 48: Another thi
> nit: need to ensure
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@48
PS3, Line 48: mpala will
> nit: has been set / is set
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@49
PS3, Line 49: ' with two 'BinaryPredicate', if
            : set hint
> I wonder why we can not assign a selectivity to a complex predicate in this
Thanks for advice, Qifan. I also consider about this, but here are some 
problems.
1. When we set a selectivity for compound, such as (a>1 and b>2) /* 
+selectivity(0.5)*/. Impala will execute PlanNode.computeCombinedSelectivity(), 
the 'conjuncts_' is a list with [(a>1),(b>2)], not [(a>1 and b>2)], the two 
children selectivity are -1, then we will lost the original  selectivity 0.5. 
If we modifiy computeCombinedSelectivity() and related code, it will be complex.
2. Use selectivity hint for predicate like (xxx) will caused 'shift/reduce 
conflicts', becasue both predicate and hint contains brackets, maybe we need to 
modify the syntax, I'm not sure.
3. In my opinion, we usually compute selectivity for a single predicate like 
a=1, b IS NULL and so on. Set selectivity for compound predicate, espically for 
multiple nested predicates, it maybe confused.
But if you insist on this, we can create a sub-task after this feature, and 
discuss about this. How do you think?


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@49
PS3, Line 49: ' with two 'BinaryPredicate', if
            : set hint
> nit: might not take the expected effect
Done


http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@51
PS3, Line 51: value for two 'BinaryPredicate' children.
            :
            : Testing:
            : - Added new fe tests in 'PlannerTest'
> I wonder if this is correct.
I understand, this computing maybe not perfect. But if we set selectivity for 
compound predicate. In 'BetweenToCompoundRule', Impala will transform a 
'Between' to 'Compound'.
E.g a 'Between 1 and 2', to 'a>=1 and a<=2', if we set selectivity for whole 
'a>=1 and a<=2', when execute 'computeCombinedSelectivity', the conjuncts is 
'a>=1' and 'a<=2', both these two children selectivity are -1, and we lost the 
original selectivity as mentioned above.


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3260
PS3, Line 3260: computed
> nit: computed
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3829
PS3, Line 3829:
> nit: including
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3831
PS3, Line 3831: // Not like predic
> Could you please add a test for not like predicate predicate-selectivity-hi
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@248
PS3, Line 248: selectivity_ > 0
> nit. This prevents a 0 selectivity. Maybe use -2 to indicate non-computable
I disable 0 for this value, since 0 for predicate selectivity hint make no 
sense.


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/InPredicate.java
File fe/src/main/java/org/apache/impala/analysis/InPredicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/InPredicate.java@177
PS3, Line 177: elect
> same comment as for BinaryPredicate.java
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
File fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java@143
PS3, Line 143: electivity
> same comment as for BinaryPredicate.java
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/Predicate.java
File fe/src/main/java/org/apache/impala/analysis/Predicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/Predicate.java@77
PS3, Line 77:         analyzer.addWarning("Selectivity hint is allowed for 
single column predicate: " +
> nit: maybe add more information about this predicate in the warning.
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@175
PS3, Line 175: tableNumRowsHint
> I wonder we can use a more generic name numRowsHint_ as the hint can be app
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@543
PS3, Line 543: st<String> args = hint.getArgs();
> nit. This comment probably should be moved to line 547.
Done


http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1556
PS1, Line 1556:     }
> I don't have a strong opinion here, just want to mention that table stats c
I see, you are right. Table stats can be stale, I modify the code. Now table 
rows hint will replace the original stats.


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@331
PS3, Line 331:   * 'partitions' being the partition
> I think we probably should move it to ScanNode with the name numRowsHints_.
Done


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5008
PS3, Line 5008: AnalyzesOk
> Why is it not an analysis error?
This is refer to other hints, in AnalyzeStmtsTest.TestTableHints(). I think 
this make sense, illegal hints should not caused exception, maybe a warning is 
enough. How do you think?


http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5050
PS3, Line 5050: (0)
> Don't we allow zero as selectivity hint? E.g. at L5060 we say 'allowed valu
Yes, maybe not. Zero value make no sense for a query. I will modify that 
comment.


http://gerrit.cloudera.org:8080/#/c/18023/3/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test:

http://gerrit.cloudera.org:8080/#/c/18023/3/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@643
PS3, Line 643: 4.28M
> Above we have 2.42M for 0.5 selectivity.
Yes, you are right, in above cases, we have 2.42M for 0.5 selectivity. But in 
'BetweenPredicate', we can not compute this card directly. In 
'BetweenToCompoundRule', we need to split 0.5 in two children, You can refer 
the comments in'BetweenToCompoundRule'. This split caused deviation.
For example, if 0.04 for 'Between', each child is 0.2.
I know this computing maybe not perfect, but we must set selectivity for each 
child. Since when Impala execute 'computeCombinedSelectivity', conjuncts are 
two children, we cannot set selectivity for whole compound predicate.
Do you have any suggestions about this? Hope for your reply.



--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng <sky...@163.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: wangsheng <sky...@163.com>
Gerrit-Comment-Date: Sat, 11 Dec 2021 09:03:57 +0000
Gerrit-HasComments: Yes

Reply via email to