Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 )
Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@79 PS2, Line 79: if ((op == Operator.AND && : (Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e1) && : Expr.IS_ALWAYS_TRUE_P > Maybe better case on the op here. For OR, when one branch is always true, w Makes sense to handle OR for completeness. I have added it. One thing that I have mentioned elsewhere is the predicate hint is currently given at the top WHERE clause rather than per expression. This limitation could be fixed in the future and it would make the OR case more interesting. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@520 PS2, Line 520: if (conjuncts.size() > 1) { > Seems we should do so even when for a single predicate case. For the single conjunct case it should have already been set on the line 515 above since that would be the top level predicate. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223 PS2, Line 223: if (getTableRefs().size() == 1) > It seems to me that we should check hasConvertLimitToSampleHint() on a sing Since this is a qualifying check to decide whether this query block is eligible for simple limit optimization, it is not necessary for the single table case to have the convert_limit_to_sample hint. Even if that hint is not present, we can still push down limit to the scan (if other conditions defined in checkSimpleLimitStmt() are met). If that hint is present for the table, we would apply the sampling anyways in HdfsPartitionPruner.pruneForSimpleLimit(). My thought process here was that only in case of joins, we want to ensure that at least one table in this query block has the hint. Let me know if this makes sense. I should probably add a comment to this method. http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS2, Line 209: estimatedTotalRows > When the #rows are indeed estimated due to missing or corrupt stats, it may Actually, the getTable().getNumRows() is the estimated row count. If stats are not present, or corrupted, it will return -1 and in the next line I check for that and skip the sampling completely. There is another option to either (a) explicitly specify the sampling percentage in the hint somehow or (b) through another query option such as simple_limit_sample_percent = N. I went with a simpler stats based approach here since I did not want to introduce yet another config option. The option (a) could be something like /* +convert_limit_to_sample=5 */ to specify the percent. I am open to discussion on this. http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@259 PS2, Line 259: # Query against a view that has WHERE clause hint. : # Both the num partitions and files are pruned. : select * from functional.alltypes_dp_2_view_1 limit 10; : ---- PLAN > Nice test queries! Ack http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@275 PS2, Line 275: select * from functional.alltypes_dp_2_view_2 limit 10; > May consider add one or two queries against tables directly (i.e., no views Yup..makes sense. I added a query at the end that is against the same base table from the view. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 02 Dec 2020 23:56:24 +0000 Gerrit-HasComments: Yes