Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 2:

(4 comments)

I have some serious concerns about the behaviour with predicates, not 
necessarily that it wouldn't be useful in some circumstances, but that it's 
changing the meaning of the LIMIT clause too much, even considering that it's 
under a query option.

Otherwise I had more minor comments on the rest of the code.

http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@184
PS2, Line 184:   // Tracks the simple LIMIT status of this query block
Can you explain what the two values are?


http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1560
PS2, Line 1560:       // check eligibility of simple limit optimization:
It seems like this would most naturally fit as part of HdfsPartition pruning, 
which is invoked above. Would be good to move this logic into a separate 
function regardless, but it seems like it could fit in that step.


http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test:

http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@24
PS2, Line 24: # limit with a WHERE clause
Add a case where the where clause has a subquery, too?


http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@26
PS2, Line 26: where bigint_col < 100
Trying to apply this with non-partition predicates doesn't seem safe.

I get there's a use case for limiting the input data scanned even when it 
changes results but this seems very different and surprising behavior for the 
LIMIT clause compared to the other optimisation.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Nov 2020 17:26:15 +0000
Gerrit-HasComments: Yes

Reply via email to