wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18023 )
Change subject: IMPALA-7942: Add query hints for cardinalities and selectivities ...................................................................... Patch Set 4: (30 comments) Hi guys, thanks for your carefully review. I've already modify code as much as possible. If I missed any thing or you have any suggestions, please tell me. Hope for your reply again. http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@9 PS3, Line 9: use > nit uses Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@10 PS3, Line 10: orse query plan > nit: this may lead Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@10 PS3, Line 10: , and this may lea > nit. remove? Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@11 PS3, Line 11: be in the future, : we can use histograms to get more precise qu > nit. to reduce such errors. Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@20 PS3, Line 20: > nit remove Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@21 PS3, Line 21: > nit is valid Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@22 PS3, Line 22: For 'SELECTIVITY' hint, we can use in these predicates: > I think we should raise a warning when the hint is not used. Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@24 PS3, Line 24: * InPredicate > nit. types of Sorry, I don't understand, can you explain this? http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@24 PS3, Line 24: > in non-compounding form. Sorry, I don't understand, can you explain this? http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@34 PS3, Line 34: > nit: formats Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@45 PS3, Line 45: ic > isn't second 0.1? I mean, these predicates without selectivity computing, default value is -1. E.g 'a>1 and b>2', these two predicates' selectivity value are both '-1'. When compute combined selectivity in PlanNode.computeCombinedSelectivity(), Impala will use only one 0.1 for all these predicates, which selectivity value are -1, which mean no selectiviy. http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@48 PS3, Line 48: Another thi > nit: need to ensure Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@48 PS3, Line 48: mpala will > nit: has been set / is set Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@49 PS3, Line 49: ' with two 'BinaryPredicate', if : set hint > I wonder why we can not assign a selectivity to a complex predicate in this Thanks for advice, Qifan. I also consider about this, but here are some problems. 1. When we set a selectivity for compound, such as (a>1 and b>2) /* +selectivity(0.5)*/. Impala will execute PlanNode.computeCombinedSelectivity(), the 'conjuncts_' is a list with [(a>1),(b>2)], not [(a>1 and b>2)], the two children selectivity are -1, then we will lost the original selectivity 0.5. If we modifiy computeCombinedSelectivity() and related code, it will be complex. 2. Use selectivity hint for predicate like (xxx) will caused 'shift/reduce conflicts', becasue both predicate and hint contains brackets, maybe we need to modify the syntax, I'm not sure. 3. In my opinion, we usually compute selectivity for a single predicate like a=1, b IS NULL and so on. Set selectivity for compound predicate, espically for multiple nested predicates, it maybe confused. But if you insist on this, we can create a sub-task after this feature, and discuss about this. How do you think? http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@49 PS3, Line 49: ' with two 'BinaryPredicate', if : set hint > nit: might not take the expected effect Done http://gerrit.cloudera.org:8080/#/c/18023/3//COMMIT_MSG@51 PS3, Line 51: value for two 'BinaryPredicate' children. : : Testing: : - Added new fe tests in 'PlannerTest' > I wonder if this is correct. I understand, this computing maybe not perfect. But if we set selectivity for compound predicate. In 'BetweenToCompoundRule', Impala will transform a 'Between' to 'Compound'. E.g a 'Between 1 and 2', to 'a>=1 and a<=2', if we set selectivity for whole 'a>=1 and a<=2', when execute 'computeCombinedSelectivity', the conjuncts is 'a>=1' and 'a<=2', both these two children selectivity are -1, and we lost the original selectivity as mentioned above. http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3260 PS3, Line 3260: computed > nit: computed Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3829 PS3, Line 3829: > nit: including Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/cup/sql-parser.cup@3831 PS3, Line 3831: // Not like predic > Could you please add a test for not like predicate predicate-selectivity-hi Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@248 PS3, Line 248: selectivity_ > 0 > nit. This prevents a 0 selectivity. Maybe use -2 to indicate non-computable I disable 0 for this value, since 0 for predicate selectivity hint make no sense. http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/InPredicate.java File fe/src/main/java/org/apache/impala/analysis/InPredicate.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/InPredicate.java@177 PS3, Line 177: elect > same comment as for BinaryPredicate.java Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java File fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java@143 PS3, Line 143: electivity > same comment as for BinaryPredicate.java Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/Predicate.java File fe/src/main/java/org/apache/impala/analysis/Predicate.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/Predicate.java@77 PS3, Line 77: analyzer.addWarning("Selectivity hint is allowed for single column predicate: " + > nit: maybe add more information about this predicate in the warning. Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java File fe/src/main/java/org/apache/impala/analysis/TableRef.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@175 PS3, Line 175: tableNumRowsHint > I wonder we can use a more generic name numRowsHint_ as the hint can be app Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@543 PS3, Line 543: st<String> args = hint.getArgs(); > nit. This comment probably should be moved to line 547. Done http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18023/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1556 PS1, Line 1556: } > I don't have a strong opinion here, just want to mention that table stats c I see, you are right. Table stats can be stale, I modify the code. Now table rows hint will replace the original stats. http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@331 PS3, Line 331: * 'partitions' being the partition > I think we probably should move it to ScanNode with the name numRowsHints_. Done http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5008 PS3, Line 5008: AnalyzesOk > Why is it not an analysis error? This is refer to other hints, in AnalyzeStmtsTest.TestTableHints(). I think this make sense, illegal hints should not caused exception, maybe a warning is enough. How do you think? http://gerrit.cloudera.org:8080/#/c/18023/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5050 PS3, Line 5050: (0) > Don't we allow zero as selectivity hint? E.g. at L5060 we say 'allowed valu Yes, maybe not. Zero value make no sense for a query. I will modify that comment. http://gerrit.cloudera.org:8080/#/c/18023/3/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test File testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test: http://gerrit.cloudera.org:8080/#/c/18023/3/testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test@643 PS3, Line 643: 4.28M > Above we have 2.42M for 0.5 selectivity. Yes, you are right, in above cases, we have 2.42M for 0.5 selectivity. But in 'BetweenPredicate', we can not compute this card directly. In 'BetweenToCompoundRule', we need to split 0.5 in two children, You can refer the comments in'BetweenToCompoundRule'. This split caused deviation. For example, if 0.04 for 'Between', each child is 0.2. I know this computing maybe not perfect, but we must set selectivity for each child. Since when Impala execute 'computeCombinedSelectivity', conjuncts are two children, we cannot set selectivity for whole compound predicate. Do you have any suggestions about this? Hope for your reply. -- To view, visit http://gerrit.cloudera.org:8080/18023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b Gerrit-Change-Number: 18023 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng <sky...@163.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com> Gerrit-Comment-Date: Sat, 11 Dec 2021 09:03:57 +0000 Gerrit-HasComments: Yes