Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@79
PS2, Line 79:  if ((op == Operator.AND &&
            :           (Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e1) &&
            :               Expr.IS_ALWAYS_TRUE_P
> Maybe better case on the op here. For OR, when one branch is always true, w
Makes sense to handle OR for completeness.  I have added it.  One thing that I 
have mentioned elsewhere is the predicate hint is currently given at the top 
WHERE clause rather than per expression. This limitation could be fixed in the 
future and it would make the OR case more interesting.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@520
PS2, Line 520: if (conjuncts.size() > 1) {
> Seems we should do so even when for a single predicate case.
For the single conjunct case it should have already been set on the line 515 
above since that would be the top level predicate.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223
PS2, Line 223: if (getTableRefs().size() == 1)
> It seems to me that we should check hasConvertLimitToSampleHint() on a sing
Since this is a qualifying check to decide whether this query block is eligible 
for simple limit optimization, it is not necessary for the single table case to 
have the convert_limit_to_sample hint.  Even if that hint is not present, we 
can still push down limit to the scan (if other conditions defined in 
checkSimpleLimitStmt() are met).  If that hint is present for the table, we 
would apply the sampling anyways in HdfsPartitionPruner.pruneForSimpleLimit().

My thought process here was that only in case of joins, we want to ensure that 
at least one table in this query block has the hint. Let me know if this makes 
sense.  I should probably add a comment to this method.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
> When the #rows are indeed estimated due to missing or corrupt stats, it may
Actually, the getTable().getNumRows() is the estimated row count.  If stats are 
not present, or corrupted, it will return -1 and in the next line I check for 
that and skip the sampling completely.

There is another option to either (a) explicitly specify the sampling 
percentage in the hint somehow  or (b) through another query option such as 
simple_limit_sample_percent = N.
I went with a simpler stats based approach here since I did not want to 
introduce yet another config option. The option (a) could be something like /* 
+convert_limit_to_sample=5  */ to specify the percent.  I am open to discussion 
on this.


http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test:

http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@259
PS2, Line 259: # Query against a view that has WHERE clause hint.
             : # Both the num partitions and files are pruned.
             : select * from functional.alltypes_dp_2_view_1 limit 10;
             : ---- PLAN
> Nice test queries!
Ack


http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@275
PS2, Line 275: select * from functional.alltypes_dp_2_view_2 limit 10;
> May consider add one or two queries against tables directly (i.e., no views
Yup..makes sense. I added a query at the end that is against the same base 
table from the view.



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Dec 2020 23:56:24 +0000
Gerrit-HasComments: Yes

Reply via email to