Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 )
Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint ...................................................................... Patch Set 2: (6 comments) Looks good to me! http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@79 PS2, Line 79: if (Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e1) && : Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e2)) { : setHasAlwaysTrueHint(true); Maybe better case on the op here. For OR, when one branch is always true, we can set setHasAlwaysTrueHint(true). http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@520 PS2, Line 520: if (conjuncts.size() > 1) { Seems we should do so even when for a single predicate case. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS2, Line 223: if (getTableRefs().size() == 1) It seems to me that we should check hasConvertLimitToSampleHint() on a single table case. SELECT * FROM T WHERE ... vs. SELECT * FROM T [convert_limit_to_sample] WHERE ... http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows When the #rows are indeed estimated due to missing or corrupt stats, it may be useful to inflate the sampling rate, say by 3 times to reduce the chance of under-sampling, or to discard the table for consideration completely. https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1219 http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@259 PS2, Line 259: # Query against a view that has WHERE clause hint. : # Both the num partitions and files are pruned. : select * from functional.alltypes_dp_2_view_1 limit 10; : ---- PLAN Nice test queries! http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@275 PS2, Line 275: select * from functional.alltypes_dp_2_view_2 limit 10; May consider add one or two queries against tables directly (i.e., no views). -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 01 Dec 2020 16:54:46 +0000 Gerrit-HasComments: Yes