Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 11:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27
PS11, Line 27: /// Node that passes along the row pulled from its child 
unchanged.
First sentence should crisply state purpose of this node, e.g.:

Node that returns an error if its child produces more than a single row. If 
successful, this node returns a deep copy of its single input row.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29
PS11, Line 29: /// Note that this node must be a blocking node!
Please give an explanation why, e.g.:

Note that this node must be a blocking node. It would be incorrect to return 
rows before the single row constraint has been validated because downstream 
exec nodes might produce results and incorrectly return them to the client. If 
the child of this node produces more than one row it means the SQL query is 
semantically invalid, so no rows must be returned to the client.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Note that this node must be a blocking node!
> No, I just wanted make it more generic. Done.
Thanks.

For reference, I think the "Speculative Generality" code smell applies here.

https://cwiki.apache.org/confluence/display/IMPALA/Effective+Coding+Practices


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Reset(RuntimeState* state) override;
> Thanks for the tips. However, I still couldn't succeed. I used the followin
Nice experiments and tests! Seems really hard to hit this case with a real 
non-subquery plan. I think documenting the blocking nature of the node in code 
comments should be enough.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68
PS11, Line 68:   DCHECK_LE(rows_collected, 1);
DCHECK that rows_collected is either 0 or 1. The current check accepts negative 
values.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85
PS11, Line 85:     row_batch_->DeepCopyTo(output_row_batch);
Why do we deep copy the row twice? Once in Open() should be sufficient.

Copying twice could be expensive in subplans.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:       StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
We should still keep the analysis and rewrite phases distinctly separate as 
before. Calling ExprRewriter within analysis breaks the separation and is very 
confusing.

The StmtRewriter should treat all Exprs in the same way, so no special code 
should be required for individual Expr subclasses.

The rewrite should be performed within the rewrite phase when 
StmtRewriter.rewrite() is called.

See AnalaysisContext.analyze() for an overview of the phases analysis and 
rewrite phases.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177
PS11, Line 177:       for (Expr expr: children_) {
Move to the rewrite phase.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@529
PS11, Line 529:       for (Expr child : getChildren()) {
Move to rewrite phase.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94
PS11, Line 94:   // This member contains the original statement provided by the 
user.
Not accurate. This contains the post-analysis toSql() before rewrites. It may 
still be different from the SQL originally provided by the user (table refs are 
resolved and fully qualified, etc.)


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@369
PS11, Line 369:   public String getOrigStmtStr() { return origSqlString_; }
getOrigSqlString()


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@371
PS11, Line 371:     if (origSqlString_ == null) origSqlString_ = toSql();
remove function and inline this into the single call site


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@245
PS11, Line 245:         StmtRewriter.rewriteNonScalarSubqueries(whereClause_, 
analyzer);
move to rewrite phase


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@99
PS11, Line 99:   public static void rewriteNonScalarSubqueries(Expr expr, 
Analyzer analyzer)
Should not be public and called from the outside during analysis. The subquery 
transformation should happen within the rewrite phase (after analysis).


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@104
PS11, Line 104:     for (Subquery subquery : subqueries) {
Why do we need to go through multiple subqueries? My understanding is that we 
currently only support a single subquery in an Expr tree. See 
StmtRewriter.rewriteWhereClauseSubqueries()


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@27
PS11, Line 27:  * Node that passes along the row pulled from its child 
unchanged.
Same comment as in the BE exec node


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@34
PS11, Line 34:   protected CardinalityCheckNode(PlanNodeId id, PlanNode child, 
String displayStatement) {
nit: we typically abbreviate Statement with Stmt


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@39
PS11, Line 39:     // Also, an optimized plan is generated with LIMIT.
remove since it doesn't really add much info


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116
PS9, Line 1116:     // TODO: This check is also repeated in 
migrateConjunctsToInlineView() because we
> I moved it up.
Nice query. Returning incorrect results is definitely not acceptable.

Looks like a more severe bug. The "or id = 3962" was dropped in the plan. From 
my initial investigation it looks like an independent bug in 
Expr.optimizeConjuncts() which is called from 
SingleNodePlanner.createScanNode().

Can you investigate and file a new JIRA if you think the bug is independent of 
your change? Thanks!


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@1434
PS11, Line 1434:   public void testRewrittenStatements() {
Instead of adding this new test, perhaps you can address the TODO in 
subqueryTest()?


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1328
PS9, Line 1328:
> * I moved these tests to PlannerTest/subquery-rewrite.test
* Thanks
* Several of the tests added in this patch already worked before, so could be 
covered by existing tests. I'm just saying we should not add duplicate 
coverage. If all the tests you added provide new coverage, then consider this 
comment resolved :)


http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test:

http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709
PS11, Line 709: Subquery must return maximum 1 row(s):
fix


http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test@994
PS11, Line 994: # Subquery that returns more than one row
comment what this test is for


http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py
File tests/hs2/test_fetch.py:

http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py@204
PS11, Line 204:   @needs_session(conf_overlay={"num_nodes": "1", "batch_size": 
"1"})
Thanks for experimenting and adding this test. I'm not sure there's much value 
in keeping this test since we were not able to hit the issue even with this 
pretty clever setup.

Let's remove this test and carefully comment the required blocking behavior in 
the code.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Feb 2018 22:54:22 +0000
Gerrit-HasComments: Yes

Reply via email to