[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 32:

Nice work, Zoltan! This was a pretty intense task. Thanks for pulling through.


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 32
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 Apr 2018 16:02:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-30 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 32:

Yay, it's finally in, it was a long journey! :)
Thanks to everyone who participated in the review.
Special thanks to Alex for the tremendous amount of insights about the frontend!


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 32
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 30 Apr 2018 09:13:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Reviewed-on: http://gerrit.cloudera.org:8080/9005
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 1,130 insertions(+), 79 deletions(-)

Approvals:
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 32
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 31: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 31
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Apr 2018 20:06:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 31:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2376/


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 31
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Apr 2018 15:56:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(1 comment)

Thanks!
I rebased the patch set to contain IMPALA-6934.

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) {
> Thanks! I think we should keep the tests added in this patch.
OK, I kept the tests.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Apr 2018 11:38:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#31).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 1,130 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/31
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 31
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) {
> OK, I pushed the bugfix for review and added you as a reviewer:
Thanks! I think we should keep the tests added in this patch.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 Apr 2018 18:07:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) {
> It's not safe to rely on our tests to find all issues. We should convince o
OK, I pushed the bugfix for review and added you as a reviewer:
https://gerrit.cloudera.org/#/c/10218/

I expect that the bugfix will be merged soon, so this change can wait a little 
more :)

I added some tests to IMPALA-6934, should I keep the current offset tests in 
this commit? They contain runtime scalar subqueries, but other than that they 
are quite similar to the tests in IMPALA-6934.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 Apr 2018 12:36:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 30:

(1 comment)

Change looks good to me. I'll +2 once we have decided how to proceed with my 
last comment :)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) subqueryStmt.setLimit(2);
> I think it's surprising that QueryStmt.setLimit() wipes the offset (at leas
It's not safe to rely on our tests to find all issues. We should convince 
ourselves that the modified setLimit() has the desired semantics for that other 
caller in L566.

I investigated this and convinced myself that the new setLimit() has the 
correct semantics. In fact, your change fixes an existing, previously unknown 
bug.

Let's separate the bugfix from this huge change. I filed IMPALA-6934 and 
assigned it to you. Please let me know if you cannot work on it

As for this change, you could either wait for the bugfix to be merged, or you 
could undo your offset fix and comment out the offset tests noting that they 
will be enabled when IMPALA-6934 is fixed. Either approach works for me.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 30
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Apr 2018 21:51:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 30:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) subqueryStmt.setLimit(2);
> Sorry I pasted the wrong query here. This is the correct one:
I think it's surprising that QueryStmt.setLimit() wipes the offset (at least it 
surprised me :) ).
I modified it to keep the offset expr. QueryStmt.setLimit() is only used at two 
places (including this one) and based on the test results it seems I haven't 
introduced a regression by changing its behavior.

Note that after calling setLimit() subqueryStmt will have an un-analyzed limit 
element. Maybe it's not a big issue, because we already had a setLimit() call 
in this method.

Added new tests to 'PlannerTest/subquery-rewrite.test' and 
'QueryTest/subquery.test'.


http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@809
PS29, Line 809: }
> Unsupported correlated subquery with runtime scalar check:
Done


http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@45
PS29, Line 45: Preconditions.checkState(child.getLimit() == 2);
> Is this needed? Should we make this a Preconditions check?
Based on the tests it seems we always invoke setLimit(2) in 
StmtRewriter.mergeExpr(), so I made it a precondition check as suggested.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 30
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Apr 2018 13:32:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-25 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#30).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 1,134 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/30
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 30
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) {
> I don't think this is right if the subquery already had a order by + limit
Sorry I pasted the wrong query here. This is the correct one:

select count(*) from functional.alltypes
where 7 = (select id from functional.alltypestiny
   order by id limit 8 offset 7);

The subquery returns a single "7" so the query should succeed, but currently it 
fails.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 Apr 2018 22:30:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(3 comments)

Getting really close :)

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484
PS29, Line 484: if (isRuntimeScalar) {
I don't think this is right if the subquery already had a order by + limit + 
offset clause. For example, the following query should work but it currently 
fails because the plan is wrong (the offset is wiped):

select count(*) from functional.alltypes
where 7 = (select int_col from functional.alltypestiny order by id limit 2 
offset 20);


http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@809
PS29, Line 809:   throw new AnalysisException("Subquery must return a 
single row: " + stmt.toSql());
Unsupported correlated subquery with runtime scalar check:


http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@45
PS29, Line 45: child.setLimit(2);
Is this needed? Should we make this a Preconditions check?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 Apr 2018 22:27:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-24 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: "functional.alltypes)", "Comparison between subqueries 
is not supported " +
> Sorry, I messed up that one critical sentence. Let me try again :)
Thanks, it seems to work! :)

I added new tests to 'PlannerTest/subquery-rewrite.test' for this case.



-- 
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 29
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 Apr 2018 13:13:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-24 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#28).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 1,060 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/28
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 28
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> Thanks for digging in, Zoltan. I investigated the predicate assignment issu
Sorry, I messed up that one critical sentence. Let me try again :)

That way, we will *not* migrate or propagate predicates into the inline view 
later during planning.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 Apr 2018 22:11:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-23 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> I see, we'd need a SubPlan for the correct behavior, right?
Correct, we'd need a SubPlan to evaluate such plans. We currently don't support 
such subplans.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> It sounds like trying to modify the predicate assignment code is a pretty o
Thanks for digging in, Zoltan. I investigated the predicate assignment issue 
and I think we can resolve it as follows:

In StmtRewriter.mergeExpr() we need to set a "limit 2" on the subquery 
statement if it is runtime scalar. That way, we will push migrate or propagate 
predicates into the inline view later at subquery time.

That change along will not work because the predicates will simply disappear 
from the plan due to a bug in CardinalityCheckNode. We need to be sure to not 
call assignConjuncts() in CardinalityCheckNode.init(). That way, we should get 
a SelectNode after the CardinalityCheckNode that contains our predicates at the 
correct place in the plan.

Let me know if you discover any issues!



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 23 Apr 2018 22:08:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> Maybe we should think again the case we discussed in PS9 SingleNodePlanner.
It sounds like trying to modify the predicate assignment code is a pretty 
open-ended thing. We should probably just reject problematic queries for now 
and move towards getting the current code in.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Apr 2018 23:22:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#26).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 936 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/26
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 26
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43
PS23, Line 43:   protected String origSqlString_ = null;
> Thinking about this again, you right that your previous code set the origSq
Done


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> My understanding is that we're not tackling correlated subqueries in this p
I see, we'd need a SubPlan for the correct behavior, right?

I extended StmtRewriter.validateCorrelatedSubquery() so this query is not 
allowed anymore.

I also added a test with the query from my previous comment.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> Regardless of what optimizations we apply to make the query fast, we must r
Maybe we should think again the case we discussed in PS9 
SingleNodePlanner.java, because PostgreSQL also raises error for that query. 
That could be fixed by adding an extra check in InlineViewRef to break the 
value transfer graph. I don't have a strong opinion about it, I just felt I 
should mention it.

Fixing this case is not as easy, at least I coudn't figure out how to do it yet.
In SingleNodePlanner.createScanNode() we assign all the predicates to the scan 
node that are bound by the tuple id of the TableRef. I played with 
Analyzer.evalAfterJoin(Expr) and the boolean variable 'evalAfterJoin' in 
Analyzer.getBoundPredicates(), but couldn't prevent the predicate push down.

I also experimented with SingleNodePlanner.migrateConjunctsToInlineView(), but 
at that point I can't prevent predicate propagation as far as I can tell. There 
is also a comment about it in createInlineViewPlan().

The problematic predicate is created in StmtRewriter.createJoinConjunct(). 
Tried to save its tuple ids into Analyzer.globalState_, but later in the 
planner I see different ids, so I couldn't really use the stored ids.

Do you have any tips? I'd prefer to fix it here rather than opening a jira for 
it.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 18 Apr 2018 17:17:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43
PS23, Line 43:   protected String origSqlString_ = null;
> OK, moved it to QueryStmt.
Thinking about this again, you right that your previous code set the 
origSqlString correctly for all statements. Still, I think it's clearer to have 
it only where it's actually needed. For most statements it's just equivalent to 
toSql() and the assignment in analysis context is confusing (and not tested).


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> No, we don't have such tests, and currently the following query is allowed:
My understanding is that we're not tackling correlated subqueries in this 
patch. The plan generated for that query is currently wrong, so we should fail 
it during analysis/rewriting.

Here's an example from Postgres that shows why the Impala plan generated by 
this patch is wrong:

Query succeeds:
create table a (c1 int, c2 int);
create table b (c1 int, c2 int);
insert into a values(1,1),(2,2);
insert intl b values(1,1),(2,2);
select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2);

Query fails at runtime:
create table a (c1 int, c2 int);
create table b (c1 int, c2 int);
insert into a values(1,1),(2,2);
insert intl b values(1,1),(2,2)(2,2);
select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2);
ERROR: more than one row returned by a subquery used as an expression


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> Wasn't sure where to put IS NULL, so I put it two different places.
Regardless of what optimizations we apply to make the query fast, we must 
return correct results.

I think the IS NULL variant is should not succeed. I confirmed the expected 
behavior with Postgres. Both IS NULL and IS NOT NULL variants return an error 
"Subquery returns more than one row".

Let's either fix this issue, or forbid these cases and file a JIRA to 
separately fix them.

This makes me wonder whether queries like the following return correct results:
select straight_join count(1) from functional.alltypes a
where (select int_col from functional.alltypes b) > 10;

Basically, when the non-subquery child is a constant we push the predicate down 
which I think is wrong. Same deal. We either need to forbid these cases and fix 
in a separate JIRA or fix them now. Likewise, Postgres returns the same 
"Subquery returns more than one row" for that query.

I think the main problem is that we push the predicate "through a limit" which 
is generally not correct. We might be able to address that in the Planner when 
migrating conjuncts into inline views. Perhaps it's enough to not migrate 
conjuncts if the inline view's stmt is runtime scalar.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 13 Apr 2018 16:32:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#25).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 927 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/25
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 25
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#24).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 922 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/24
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 24
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@425
PS23, Line 425:   analysisResult_.stmt_.origSqlString_ = 
analysisResult_.stmt_.toSql();
> This is pretty weird. See my comment in StatementBase.
Removed it from here.
Now I'm setting origSqlString_ in SelectStmt.analyze() and UnionStmt.analyze().


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43
PS23, Line 43:   protected String origSqlString_ = null;
> Why is this here and not in QueryStmt? I think it's cleaner to have it in Q
OK, moved it to QueryStmt.

So when I set it in AnalysisContext.analyze() after the first analysis phase 
(before any rewrites), wasn't it correctly set for all the statements?


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025: if (isCorrelated && 
!item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) {
> I looked into this and concluded that we should convert the type check to a
Turned it into a precondition.

To fix the error message I changed the condition in L1032 and introduced a new 
predicate called IS_UDA_FN.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@747
PS23, Line 747: cardinalityCheckNode.computeStats(ctx_.getRootAnalyzer());
> Why do we need to computeStats() again?
I introduced it in PS2 as some kind of workaround, but it seems we no longer 
need it.
Removed this line.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1163
PS23, Line 1163: // Scalar subquery check is done at runtime, not at 
parse-time
> not at parse-time -> not during analysis
Done


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1164
PS23, Line 1164: AnalyzesOk("select id from functional.alltypestiny t where 
int_col = " +
> Loop over cmpOperators and have a few different variants inside the loop
Created a loop.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> Do we have a negative test that shows disallowing correlated scalar subquer
No, we don't have such tests, and currently the following query is allowed:

select * from functional.alltypes t1 where id < (select id from 
functional.alltypes t2 where t1.int_col = t2.int_col);

However, it raises the cardinality error, but I don't think we handle 
correlated subqueries correctly all the time.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " +
> Also add a variant with (subquery) IS NULL. That variant is currently forbi
Wasn't sure where to put IS NULL, so I put it two different places.
In the meantime I had to modify IsNullPredicate.java, and noticed a weird 
behavior:

 select count(1)
 from functional.alltypes
 where (select int_col from functional.alltypes) is null

The above query executes succesfully even though the subquery clearly returns 
multiple rows.

But when I use IS NOT NULL:

 select count(1)
 from functional.alltypes
 where (select int_col from functional.alltypes) is not null

Then the query fails with the expected error "Subquery must not return more 
than one row".

And when I refer to a column that is completely NULL, the results are the exact 
opposite, ie. IS NULL yields error and IS NOT NULL executes successfully.

I guess the explanation is that the IS NULL and IS 

[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025: if (!subquery.getType().isScalarType()) {
> Right. That could be an indication that this check is useless or that we ha
I looked into this and concluded that we should convert the type check to a 
Precondition.

However, I also found some weirdness in our error checking, maybe you can fix 
that also. The error message for the following query doesn't quite make sense:

select * from functional.alltypes t1 where id < (select id from 
functional.alltypes t2 where t1.int_col = t2.int_col)

UDAs are not supported in the select list of correlated subqueries: (SELECT id 
FROM functional.alltypes t2 WHERE t1.int_col = t2.int_col)



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Apr 2018 22:15:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(3 comments)

Code looks pretty good now. I'll do another pass over everything focusing on 
the tests.

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025: if (isCorrelated && 
!item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) {
> This allows runtime scalars here, because their type is scalar now and not
Right. That could be an indication that this check is useless or that we have a 
gap in test coverage. I'm worried about the latter. One of two outcomes is 
acceptable:
* add a test that hits this branch
* convince ourselves that the type is always scalar here, and then add a 
Preconditions check


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@82
PS23, Line 82:   if (node instanceof StatementBase) {
No need for this. We're always going to have a StatementBase


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917
PS23, Line 917: //TODO: make it work again or delete
  : //testToSql("select * from functional.alltypes where not 
(id < 10 and " +
  : //"(int_col in (select int_col from 
functional.alltypestiny)) and " +
  : //"(string_col = (select max(string_col) from 
functional.alltypestiny)))",
  : //"SELECT * FROM functional.alltypes WHERE NOT (id < 10 
AND " +
  : //"(int_col IN (SELECT int_col FROM 
functional.alltypestiny)) AND " +
  : //"(string_col = (SELECT max(string_col) FROM 
functional.alltypestiny)))");
> So, is it OK if I just leave this TODO here?
Please file an improvement JIRA for this case, and remove the test.

The issue here is that we push the NOT into the condition, so the ANDs turn 
into ORs and then we no longer support the subquery.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Apr 2018 21:44:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 23:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java@1193
PS21, Line 1193: Subquery subq = (Subquery) this;
> I think we can remove this when taking care of the other comments.
Done


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1092
PS21, Line 1092: Preconditions.checkState(isAnalyzed());
> Preconditions.checkState(isAnalyzed());
Done


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@427
PS21, Line 427: ExprSubstitutionMap smap = new ExprSubstitutionMap();
> I don't think we should have to do this. At the very least we should have a
Removed these lines and updated ExistsPredicate as you recommended.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@464
PS21, Line 464: // to the stmt's table refs later. Explicitly set the 
inline view's column labels
> Let's have:
Done


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025: if (isCorrelated && 
!item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) {
> This has a slightly different meaning than the original code.
This allows runtime scalars here, because their type is scalar now and not 
array type.

Maybe it is worth to check that the subquery doesn't return multiple columns.

Anyway, I removed this if statement and it seems that the FE tests don't miss 
it.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917
PS23, Line 917: //TODO: make it work again or delete
  : //testToSql("select * from functional.alltypes where not 
(id < 10 and " +
  : //"(int_col in (select int_col from 
functional.alltypestiny)) and " +
  : //"(string_col = (select max(string_col) from 
functional.alltypestiny)))",
  : //"SELECT * FROM functional.alltypes WHERE NOT (id < 10 
AND " +
  : //"(int_col IN (SELECT int_col FROM 
functional.alltypestiny)) AND " +
  : //"(string_col = (SELECT max(string_col) FROM 
functional.alltypestiny)))");
So, is it OK if I just leave this TODO here?
Or, should I remove this part?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 11 Apr 2018 15:32:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-11 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#22).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 911 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/22
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 22
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-10 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 21:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   if (!operand.type_.isNumericType() && 
!operand.type_.isNull()) {
> Yeah, now I vaguely remember from those formal languages courses that the t
That's true. I see the isRuntimeScalar flag more like a hint on how we should 
evaluate this expression, rather than part of type analysis.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java@1193
PS21, Line 1193: // Callers should be sure to analyze the limit clause of 
stmt
I think we can remove this when taking care of the other comments.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1092
PS21, Line 1092: // limit 1 clause
Preconditions.checkState(isAnalyzed());


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@427
PS21, Line 427: newSubquery.getStatement().setIsRuntimeScalar(
I don't think we should have to do this. At the very least we should have a 
comment explaining why this is necessary.

I looked into this issue and concluded that you probably added this code to 
work around a bug with expr substitution in ExistsPredicate.

The bug is that in ExistsPredicate we keep a member "subquery_", but that 
member is not properly updated during expr substitution. The easiest fix is to 
remove the "subquery_" member in ExistsPredicate and consistently use 
"children_.get(0)" wherever we used "subquery_" before.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@464
PS21, Line 464: SelectStmt subqueryStmt = 
(SelectStmt)expr.getSubquery().getStatement();
Let's have:

boolean isScalarSubquery = expr.getSubquery().isScalarSubquery();

and use that variable consistently in this function.

It's weird to call isScalarSubquery() once we start modifying the subquery 
statement. It's easy to avoid that.


Might also be convenient/consistent to have a isRuntimeScalar boolean variable 
here.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025: if (!subquery.getType().isScalarType()) {
This has a slightly different meaning than the original code.

I could not find a test that covers this branch. Does this check even make 
sense? Why would we not be able to allow runtime scalar here?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 10 Apr 2018 22:22:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 21:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   if (!operand.type_.isNumericType() && 
!operand.type_.isNull()) {
> Sorry confusing sentence. I mean:
Yeah, now I vaguely remember from those formal languages courses that the type 
deduction should only go bottom-up. :)
(However, we still set isRuntimeScalar based on the context)

There are some places where I have to use the old semantics of 
isScalarSubquery(), ie. to only test that the type of a subquery is scalar and 
not test whether it returns multiple rows.


http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
File fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java@66
PS19, Line 66:   @Override
> We really should't be setting the type of an expression from the outside. T
Removed this line.


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@a614
PS21, Line 614:
This wasn't needed for this JIRA, but looked weird, so I simplified it.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 05 Apr 2018 12:45:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-04-05 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#21).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 914 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/21
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
> I don't think the type of an expression should not depend on its enclosing
Sorry confusing sentence. I mean:

The type of an expression should *not* depend on its enclosing context.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Mar 2018 22:03:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 19:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   if (!operand.type_.isNumericType() && 
!operand.type_.isNull()) {
> Thanks!
I don't think the type of an expression should not depend on its enclosing 
context. That breaks a fundamental invariant and may cause all sorts of 
problems.

If the alternative is to always analyze the limit clause in 
Expr.isScalarSubquery(), then I strongly prefer that variant, even though it's 
not perfectly clean either.


http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
File fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java@66
PS19, Line 66: if(!queryStmt.returnsSingleRow()) subquery_.type_ = new 
ArrayType(subquery_.type_);
We really should't be setting the type of an expression from the outside. 
That's the job of expression analysis itself.

One alternative could be to keep a the ArrayType or a flag "isScalarSubquery" 
internally in Subquery and then use that flag or type to properly respond to 
Expr.isScalarSubquery().



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Mar 2018 21:44:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-29 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#19).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
27 files changed, 910 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/19
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-26 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#17).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
27 files changed, 912 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/17
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
> Thanks for making these changes. I did not look at all failing tests, but I
Thanks!

At some point I was suspecting that some code parts might rely on the exact 
type of a subquery. So I made the type of the subquery ArrayType in InPredicate 
and ExistsPredicate. But, it didn't seem to solve the problem, so I removed the 
ArrayTypes.

Now that you mentioned it, I figured out that I forgot to "transfer" the exact 
type of the subquery to the new subquery in StmtRewriter.rewriteExpr().

I also tried your version of Expr.isScalarSubquery(), but I got a bunch of 
IllegalStateExceptions because the limit element wasn't analyzed (you already 
wrote a comment about it, but still I couldn't always make it analyzed, and my 
other approach already worked).



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 27 Mar 2018 00:02:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
> I like this suggestion, it does make things much easier and simple.
Thanks for making these changes. I did not look at all failing tests, but I did 
into the following one:

// Correlated predicates in the subquery's ON clause. Vary join column type.
AnalyzesOk("select * from functional.alltypes t where t.id IN " +
"(select a.id from functional.alltypesagg a left outer join " +
"functional.alltypessmall s on s.int_col = t.int_col)");

The problem with this one is that the StmtRewriter relies on 
Subquery.isScalarSubquery(). The existing isScalarSubquery() relies on the type 
of the Subquery to determine whether a Subquery can be proven to be scalar at 
compile time. Since we have changed how the type is determined, we broke the 
isScalarSubquery() function. I hacked up the following to fix it:

  public boolean isScalarSubquery() {
Preconditions.checkState(isAnalyzed_);
if (!(this instanceof Subquery)) return false;
Subquery subq = (Subquery) this;
// Callers should be sure to analyze the limit clause of stmt
SelectStmt stmt = (SelectStmt) subq.getStatement();
return stmt.returnsSingleRow() && getType().isScalarType();
  }

Of course, there are other ways to fix this, just wanted to point you in the 
right direction. Let me know if this needs further discussion.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Mar 2018 23:08:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#16).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
26 files changed, 897 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/16
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 16
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 15:

(7 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc@90
PS13, Line 90:
> We should technically also set num_rows_returned_ (it's used in a couple of
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   if (!operand.type_.isNumericType() && 
!operand.type_.isNull()) {
> Good question, thanks for the detailed explanation. The following alternati
I like this suggestion, it does make things much easier and simple.

However, I'm getting a few strange errors that I couldn't really fix.

Some say "select list expression not produced by aggregation output", others 
complain about the memory estimation, but I don't really see the connection to 
my changes.

I've ran the frontend tests and the backend tests that I touched.
Frontend tests produce 8 failures and 3 errors,
tests/query_test/test_nested_types.py::TestNestedTypes::test_subplan also fails.


http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@32
PS13, Line 32: m
> extra space
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116
PS9, Line 1116: // place.
> Looking at the example query again, I think it's ok that it succeeds. We're
I removed the quick fix from InlineViewRef.analyze()


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@1434
PS11, Line 1434:
> Thanks! Works for me.
Done


http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test:

http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709
PS13, Line 709: Subquery must not return more than one row: SELECT o_orderkey 
FROM c.c_orders WHERE o_orderkey = 600 OR o_orderkey = 4285920
> Is it possible to include the query test in the error message so we can be
Done


http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test@841
PS13, Line 841: functional
> Let's drop the functional. prefix, it's not required - the idea is that we
I get "Could not resolve table reference: ..." errors if I do that.

The other queries in this file are also prefixed with the database.

Should I migrate the queries to a different file where the test matrix is used?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Mar 2018 23:08:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#15).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
24 files changed, 726 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/15
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-21 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#14).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
24 files changed, 727 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/14
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 13:

(4 comments)

Backend changes look good to me still.

http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc@90
PS13, Line 90: COUNTER_SET(rows_returned_counter_, 1);
We should technically also set num_rows_returned_ (it's used in a couple of 
places, like be/src/exec/partitioned-aggregation-node.cc).


http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@32
PS13, Line 32:
extra space


http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test:

http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709
PS13, Line 709: Subquery must not return more than one row:
Is it possible to include the query test in the error message so we can be sure 
the error message is correct.


http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test@841
PS13, Line 841: functional
Let's drop the functional. prefix, it's not required - the idea is that we can 
run against different file formats in the test matrix automatically.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 13 Mar 2018 22:19:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-09 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 13:

PS13 was only a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 09 Mar 2018 17:38:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-03-09 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#13).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 958 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/13
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-02-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116
PS9, Line 1116: if (inlineViewRef.getViewStmt().isRuntimeScalar()) {
> The optimization is definitely intentional and it's not clear to me why thi
Looking at the example query again, I think it's ok that it succeeds. We're 
just deducing information from the query.

I'm in favor of removing your quick fix to InlineViewRef.analyze() and accept 
that we can infer that the query is semantically correct as a whole.

We should definitely add this case query as a test though and explain the 
behavior.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2018 08:05:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-02-14 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#12).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
28 files changed, 958 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/12
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-02-14 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 11:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27
PS11, Line 27: /// Node that passes along the row pulled from its child 
unchanged.
> First sentence should crisply state purpose of this node, e.g.:
Thanks, I am using your sentences :)


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29
PS11, Line 29: /// Note that this node must be a blocking node!
> Please give an explanation why, e.g.:
Thanks again, I don't think I could formulate it so nicely! :)


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Note that this node must be a blocking node!
> Thanks.
To my defense in PS1 it was called ScalarCheckNode and it only allowed one row 
:)
Later when I renamed it I felt that some extra flexibility suits the new name.
But I see that it is unlikely that we'll ever need this flexibility.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Reset(RuntimeState* state) override;
> Nice experiments and tests! Seems really hard to hit this case with a real
Thanks! Done.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68
PS11, Line 68:   DCHECK_LE(rows_collected, 1);
> DCHECK that rows_collected is either 0 or 1. The current check accepts nega
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85
PS11, Line 85: row_batch_->DeepCopyTo(output_row_batch);
> Why do we deep copy the row twice? Once in Open() should be sufficient.
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
> We should still keep the analysis and rewrite phases distinctly separate as
I'm a bit confused about how to organise this.
In AnalysisContext.analyze() I see the following phases:
* analyze
* rewrite exprs
* rewrite stmts
* re-analyze if needed

If I don't do anything here, this code will throw an exception that needs to be 
handled somewhere, but the real problem is that the first analyze phase will 
remain unfinished. Then, we would start applying rewrites on the partially 
analyzed tree.

So, some modifications are needed to analyzeImpl() I think. I tried to simply 
return from it when we encounter a non-scalar subquery. Then, do the rewrite in 
StmtRewriter, but analyze() is also called during statement rewriting. This is 
problematic because the AnalysisExceptions are wrapped in an 
IllegalStateException and that causes some tests to fail. This happens when the 
subquery is still invalid after making it scalar, e.g.:

 select id
 from functional.alltypestiny
 where int_col = (select max(timestamp_col) from
  functional.alltypessmall)

 ERROR: AnalysisException: null
 CAUSED BY: IllegalStateException: Failed analysis after expr substitution.
 CAUSED BY: AnalysisException: operands of type INT and TIMESTAMP are not 
comparable: int_col = (SELECT max(timestamp_col) FROM functional.alltypessmall)


My subquery "rewrite" only changes the type of the subquery. Aren't these kind 
of type deductions belong to the analysis phase? Maybe I should move 
rewriteNonScalarSubqueries() to Expr and rename it to something like 
makeContainedSubquerySalar() ?

Am I missing something? Do you have something in mind how should I deal with it?


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177
PS11, Line 177:   for (Expr expr: children_) {
> Move to the rewrite phase.
see comment in ArithmeticExpr.java


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-02-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 11:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27
PS11, Line 27: /// Node that passes along the row pulled from its child 
unchanged.
First sentence should crisply state purpose of this node, e.g.:

Node that returns an error if its child produces more than a single row. If 
successful, this node returns a deep copy of its single input row.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29
PS11, Line 29: /// Note that this node must be a blocking node!
Please give an explanation why, e.g.:

Note that this node must be a blocking node. It would be incorrect to return 
rows before the single row constraint has been validated because downstream 
exec nodes might produce results and incorrectly return them to the client. If 
the child of this node produces more than one row it means the SQL query is 
semantically invalid, so no rows must be returned to the client.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Note that this node must be a blocking node!
> No, I just wanted make it more generic. Done.
Thanks.

For reference, I think the "Speculative Generality" code smell applies here.

https://cwiki.apache.org/confluence/display/IMPALA/Effective+Coding+Practices


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Reset(RuntimeState* state) override;
> Thanks for the tips. However, I still couldn't succeed. I used the followin
Nice experiments and tests! Seems really hard to hit this case with a real 
non-subquery plan. I think documenting the blocking nature of the node in code 
comments should be enough.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68
PS11, Line 68:   DCHECK_LE(rows_collected, 1);
DCHECK that rows_collected is either 0 or 1. The current check accepts negative 
values.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85
PS11, Line 85: row_batch_->DeepCopyTo(output_row_batch);
Why do we deep copy the row twice? Once in Open() should be sufficient.

Copying twice could be expensive in subplans.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:   StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
We should still keep the analysis and rewrite phases distinctly separate as 
before. Calling ExprRewriter within analysis breaks the separation and is very 
confusing.

The StmtRewriter should treat all Exprs in the same way, so no special code 
should be required for individual Expr subclasses.

The rewrite should be performed within the rewrite phase when 
StmtRewriter.rewrite() is called.

See AnalaysisContext.analyze() for an overview of the phases analysis and 
rewrite phases.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177
PS11, Line 177:   for (Expr expr: children_) {
Move to the rewrite phase.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@529
PS11, Line 529:   for (Expr child : getChildren()) {
Move to rewrite phase.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94
PS11, Line 94:   // This member contains the original statement provided by the 
user.
Not accurate. This contains the post-analysis toSql() before rewrites. It may 
still be different from the SQL originally provided by the user (table 

[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-31 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9:

(22 comments)

Thanks for the comments! I could resolve most of them, but still have some 
questions.

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@25
PS9, Line 25: namespace impala {
> Side note: I could hot a DCHECK with this query:
This query raises the cardinality error.
I also added this query to QueryTest/subquery.test


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Currently it is only used to check <= 1 cardinality of the 
child.
> I cannot think of a scenario where we would need to check for a cardinality
No, I just wanted make it more generic. Done.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Open(RuntimeState* state) override;
> Thanks for your attempt and details. The following ideas might help:
Thanks for the tips. However, I still couldn't succeed. I used the following 
query:
 select 1 from (values(1 a)) v
 where a = (select id from functional.alltypestiny)
 limit 1
Plan is:
 PLAN-ROOT SINK
 03:HASH JOIN [RIGHT SEMI JOIN]
 |--00:UNION
 | constant-operands=1
 02:CARDINALITY CHECK
 01:SCAN HDFS [functional.alltypestiny]

But, PartitionedHashJoinNode calls GetNext() twice on its child. First, it 
calls in Open() -> BlockingJoinNode::GetFirstProbeRow(state) -> 
CardinalityCheckNode::GetNext().
Then, in PartitionedHashJoinNode::GetNext() it starts to process the rows in 
the current rowbatch (fetched by Open()). After that it calls 
NextProbeRowBatch(), which calls GetNext() on CardinalityCheckNode, and this 
time it raises the error.

So as far as I can tell the followings have to go wrong to cause the bug:
- change the behavior of PartitionedHashJoinNode
- make CardinalityCheckNode a streaming node despite of the comments that warn 
you not to do that
- batch size happens to be zero

Anyway, I added an e2e to hs2/test_fetch.py, although I couldn't make it fail.


However, if CardinalityCheckNode is in a subplan, the following query succeeds 
(if batch_size=1, with default batch size it is flaky):
 select c_custkey
 from customer c
 where c_custkey < (select o_orderkey
from c.c_orders
where o_orderkey in (4285920, 600, 500)))
 limit 1;

o_orderkey 4285920 and 600 belong to the same customer, while o_orderkey 
500 belongs to a different customer.
Without LIMIT 1 it fails. But, since the subquery refers to an identifier from 
the outer query I guess it counts as a correlated subquery, while this JIRA is 
only about uncorrelated subqueries. Anyway, it might be a good test for the 
uncorrelated subquery JIRA.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@58
PS9, Line 58:   // The associated SQL statement
> for error reporting
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc@37
PS9, Line 37:   DCHECK_EQ(limit_, max_cardinality_);
> Add comment explaining why we are not calling ExecNode::Prepare()
we are calling it in L40.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@189
PS9, Line 189: subquery.getStatement().setIsRuntimeScalar(true);
> I don't think this approach is general enough to handle cases like the foll
Extended StmtRewriter with a new static method called 
rewriteNonScalarSubqueries(). It takes an expr, and tries to rewrite the 
contained subqueries' type to scalar.

Now these queries fail at runtime. They succeed if we add a WHERE clause with 
'id = 1'.

I added tests with these kind of queries.

Do you have any other corner case in mind?


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94
PS9, Line 94:   public boolean isRuntimeScalar() { return isRuntimeScalar_; }
> please move these getters and setters to L365 to not clutter the members li
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@99
PS9, Line 99:   protected String origStmtStr_ = null;
> * 

[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-31 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#10).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M tests/hs2/test_fetch.py
23 files changed, 1,076 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/10
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@25
PS9, Line 25: namespace impala {
Side note: I could hot a DCHECK with this query:

select a from (values(1 a),(2),(3)) v where a = (select x from (values(1 
x),(2),(3)) v);


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Open(RuntimeState* state) override;
> I tried to craft a test scenario, but couldn't really succeed with it.
Thanks for your attempt and details. The following ideas might help:

1. Use num_nodes=1 to keep things simple and avoid exchanges
2. Use batch_size=1 to return batches with one row at a time
3. Use limit 1 at the top of the query so it succeeds once the first row is seen
4. You need to get the subquery on the probe side of the join. I got the join 
to be inverted with the following query:

select 1 from (values(1 a)) v where a = (select bigint_col from 
functional.alltypes limit 1);



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2018 21:17:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9:

Saw this, will take a look.


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 19:02:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 17:40:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#9).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
17 files changed, 787 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/9
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 8:

Hi Alex, Dimitris,

Could you review this change, especially the FE part?


--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 14:16:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex 
Behm, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#8).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
17 files changed, 787 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/8
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89
PS6, Line 89: CopyRows
> Yeah, I think we can just assume that the frontend will only generate plans
I added comments about it to this class, and to the corresponding plan node 
class as well.


http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116
PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) {
> I don't think we have test coverage for this code yet, since it's only exec
I added some tests to nested-types-subplan.test.
They use tpch_nested_parquet.customer because functional.allcomplextypes is 
empty.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 17 Jan 2018 14:13:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 7:

(2 comments)

The backend code is looking good to me, comments are mainly about test 
coverage. I'm not confident about the frontend code, I'd like input from Alex 
or Dimitris about whether the approach is right.

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89
PS6, Line 89: CopyRows
> No, in this case RowBatch::DeepCopyTo fails.
Yeah, I think we can just assume that the frontend will only generate plans 
with max_cardinality <= 1 for now and deal with additional complications later 
if we need the additional generality. We can't test the other cases now so 
seems best to be clear that it doesn't work.


http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116
PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) {
I don't think we have test coverage for this code yet, since it's only executed 
if the node is inside a subplan. We do support this, e.g. the following query 
puts a cardinality check inside a subplan:

  select a.id from functional.allcomplextypes a
where id < (select item from a.int_array_col);

It looks like these planner tests and query tests have a number of subqueries 
that get put inside subplans:

  
./testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
  
./testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test

Your code does look like it handles subplans correctly, just want to make sure 
we have the test coverage.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Jan 2018 21:54:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#7).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
16 files changed, 739 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/7
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@44
PS6, Line 44: single row from child
> generalize the "single"
Done


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@45
PS6, Line 45: scalar
> perhaps "child" is a better name?
renamed it to simply 'row_batch_' because during copying the name 
'child_row_batch_' was confusing.


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@51
PS6, Line 51: more than one row.
> should not be specific to "one"
woops, thanks


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89
PS6, Line 89: *eos = true;
> if max_cardinality_ > number of tuples in a batch, does this work?
No, in this case RowBatch::DeepCopyTo fails.

This problem is also present in Open, however in Open we create the RowBatch 
and we can set the capacity to max_cardinality. Maybe we can assume that 
max_cardinality is not too huge. If we cannot assume that, then I think we 
cannot implement it as a blocking node.

In GetNext, we get an already created output_row_batch and we can't change its 
capacity.
So I changed the behavior to something similar to SelectNode::GetNext().



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Jan 2018 18:09:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@44
PS6, Line 44: single row from child
generalize the "single"


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@45
PS6, Line 45: scalar
perhaps "child" is a better name?


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@51
PS6, Line 51: more than one row.
should not be specific to "one"


http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89
PS6, Line 89: *eos = true;
if max_cardinality_ > number of tuples in a batch, does this work?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Jan 2018 16:58:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#6).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
16 files changed, 710 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/6
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#5).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
16 files changed, 709 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/5
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h@27
PS4, Line 27: Node that passes along the row pulled from its child unchanged.
: /// If there is more than one row, it aborts the query.
> sync this with the changes:
updated the comment


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@61
PS4, Line 61:  > 1
> shouldn't this be max_cardinality_?
oops, yes it should, thanks


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@65
PS4, Line 65: h.num_rows() == 1
> this is another check that seems to have been carried from the stricter <=
It is a bug, carried from the earlier implementation.
I changed this check, now this BE exec node is aligned with the corresponding 
FE plan node.


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@69
PS4, Line 69: ted, 1);
> same here
Done


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@85
PS4, Line 85: == 1
> and here
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Jan 2018 13:35:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-15 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h@27
PS4, Line 27: Node that passes along the row pulled from its child unchanged.
: /// If there is more than one row, it aborts the query.
sync this with the changes:
- <= max_cardinality_ is now permitted (not necessarily 1)
- more than one row may be returned, so generalize to ".. passes along rows ..."


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@61
PS4, Line 61:  > 1
shouldn't this be max_cardinality_?


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@65
PS4, Line 65: h.num_rows() == 1
this is another check that seems to have been carried from the stricter <= 1 
implementation. if there are no tests for > 1 limit, then perhaps this should 
be paired back down to <= 1 ? Speaking of which, if its not possible to create 
a plan for the node from the FE, is the general approach to pair back the 
functionality? otherwise, I'd expect tests independent of the FE.


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@69
PS4, Line 69: ted, 1);
same here


http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@85
PS4, Line 85: == 1
and here



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Jan 2018 00:37:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-15 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#4).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called CardinalityCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
A be/src/exec/cardinality-check-node.cc
A be/src/exec/cardinality-check-node.h
M be/src/exec/exec-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
15 files changed, 689 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/4
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 3:

(10 comments)

Thanks all!

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h@29
PS3, Line 29: ScalarCheckNode
> small naming nit, but it seems that this class checks that the cardinality
I agree that it's a better name, so renamed it to CardinalityCheckNode.
Also, to make it more general, it can be parameterized by the max allowed 
cardinality.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@33
PS3, Line 33: Status ScalarCheckNode::Prepare(RuntimeState* state) {
> Can you DCHECK that this doesn't have any conjuncts assigned to it. Those a
I added a DCHECK for conjuncts.
The limit of this node is set by the constructor of the plan node and it equals 
to the allowed cardinality. I added a DCHECK_EQ here to check if limit and 
max_cardinality is the same.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@56
PS3, Line 56: if (rows_collected > 1) {
> Does conditional fit on one line?
Now that we want to display the associated SQL statement as well, it doesn't.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57
PS3, Line 57: scalar
> what guarantees a scalar here?
At runtime nothing, it is checked at parse-time.
However, as you suggested I renamed it to CardinalityCheckNode.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57
PS3, Line 57: scalar
> It would be a lot more user friendly if we could report which scalar subque
The subquery gets rewritten at one point of the analysis.
E.g. it rewrites "id = 1 or id = 2" to "id IN (1, 2)"

Therefore I needed to hack a little to plumb the original SQL statement. I 
wonder if there's a better or usual way to do that.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@59
PS3, Line 59: if (child_batch.num_rows() == 1) {
> Does conditional fit on one line?
yes


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@177
PS3, Line 177: if (operand instanceof Subquery && 
!operand.type_.isScalarType()) {
> Handling this case here doesn't feel quite right, since it's kind-of an err
I restructured the branches


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@740
PS3, Line 740: // Scalar checking requires to merge the output of the child 
fragment
> Maybe this would be better explained as "The scalar check must execute on a
yeah, it does sound better.


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@26
PS3, Line 26: conjuncts and a limit clause.
> what conjuncts and what limit? perhaps phrase this to be more specific.
Oops, it was a leftover from copy pasting SelectNode.


http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839
PS3, Line 839:  QUERY
> We should also add planner tests to test the explain plan output. E.g. see
Added some.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Jan 2018 17:20:12 +
Gerrit-HasComments: 

[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@33
PS3, Line 33: Status ScalarCheckNode::Prepare(RuntimeState* state) {
Can you DCHECK that this doesn't have any conjuncts assigned to it. Those 
aren't implemented by this node so we'd get incorrect results in that case. We 
should actually convince ourselves that it's not possible to assign conjuncts 
to this plan node (I honestly don't know if it is - might be a question for 
Alex).

I think the same thing applies for a limit of 0 - I'm not sure if we can get a 
limit applied.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@56
PS3, Line 56: if (rows_collected > 1) {
Does conditional fit on one line?


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57
PS3, Line 57: scalar
> what guarantees a scalar here?
It would be a lot more user friendly if we could report which scalar subquery 
failed. It might be feasible to plumb information (the original SQL text? a 
line number?) through from analysis so it can be reported here.

Actually, if we just plumbed the string through from analysis, we could make 
this a generic node like Vuk suggested that doesn't know anything about the SQl 
semantics it's implementing.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@59
PS3, Line 59: if (child_batch.num_rows() == 1) {
Does conditional fit on one line?


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@177
PS3, Line 177: if (operand instanceof Subquery && 
!operand.type_.isScalarType()) {
Handling this case here doesn't feel quite right, since it's kind-of an 
error-handling path. It seems like this part should be a separate branch from 
the error handling that starts in l175.


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@740
PS3, Line 740: // Scalar checking requires to merge the output of the child 
fragment
Maybe this would be better explained as "The scalar check must execute on a 
single node".


http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839
PS3, Line 839:  QUERY
We should also add planner tests to test the explain plan output. E.g. see 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 12 Jan 2018 23:38:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-12 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 3:

(3 comments)

couple of small clarifying comments.

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h@29
PS3, Line 29: ScalarCheckNode
small naming nit, but it seems that this class checks that the cardinality of 
the child is <= 1. It does not guarantee that the child produces a scalar 
(could be a multi-column row). perhaps CardinalityCheckNode?
true that the frontend should guarantee arity here, but perhaps it should be 
checked/asserted or the name should match the more general functionality.


http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57
PS3, Line 57: scalar
what guarantees a scalar here?


http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@26
PS3, Line 26: conjuncts and a limit clause.
what conjuncts and what limit? perhaps phrase this to be more specific.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 12 Jan 2018 18:49:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(15 comments)

Thanks for all the comments!

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@44
PS1, Line 44:   boost::scoped_ptr child_row_batch_;
> If I understand correctly, having child_row_batch_ as member is usually nee
You are right, but I still keep it because I've changed the implementation and 
now we need to have a member. I renamed it to scalar_row_batch_ btw.

ScalarCheckNode is now a blocking node and deep-copies its child's row in 
Open(), then passes it to its parent in GetNext().


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47
PS1, Line 47: child_row_idx_
> This member is not used anywhere. Probably it should be removed.
Done


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31
PS1, Line 31: NULL
> Prefer nullptr in new code.
Removed it from here since it is a boost::scoped_ptr with a default 
constructor. But maybe I can add "= nullptr" in the class definition to make 
the initialization explicit.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32
PS1, Line 32:   child_row_idx_(0),
> We've mostly been switching to initialising member variables to constants i
Removed those members.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56
PS1, Line 56:   if (child_row_batch_->num_rows() == 0) {
> I think this needs to be a while loop that resets the batch after each iter
I followed this logic and moved this part to Open() since now it is a blocking 
node and we want to call Close() on the child asap.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@63
PS1, Line 63: one
> "zero or one" or "maximum one" would be more exact
Done


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78
PS1, Line 78:   child_row_batch_->TransferResourceOwnership(row_batch);
> I think it's the start of a good idea. I originally suggested modelling thi
Thanks for the idea, I followed this logic. So I deep copy the child's row in 
Open(), or raise an error if there are more rows. Also, I close the child right 
after reading its row. Then, deep copy the single scalar row in GetNext() into 
the parent's row batch.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@178
PS1, Line 178:   // Runtime check needed to check if subquery returns 
scalar
> This is not just a check, but can be a transformation, as array types are r
Replaced the comment to make it more informative.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@179
PS1, Line 179:   // Runtime check needed to check if subquery returns 
scalar
> This is not just a check, but can be a transformation, as array types are r
Replaced the comment.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@90
PS1, Line 90:   // If true, a ScalarCheckNode is needed to check this 
statement's result at runtime
> nit: missing .
Done


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735
PS1, Line 735:* @throws ImpalaException
> We don't generally have @throws declarations in Java comments
Oops, eclipse added it automagically.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746
PS1, Line 746: selectNode
> scalarCheckNode?
Yes!


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java:


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-12 Thread Zoltan Borok-Nagy (Code Review)
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9005

to look at the new patch set (#2).

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called ScalarCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
A be/src/exec/scalar-check-node.cc
A be/src/exec/scalar-check-node.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
A fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
13 files changed, 448 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/2
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h
File be/src/exec/scalar-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47
PS1, Line 47: child_row_idx_
This member is not used anywhere. Probably it should be removed.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 17:40:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932
PS1, Line 932: 
The case when the subquery returns 0 row could be tested too.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746
PS1, Line 746: selectNode
scalarCheckNode?



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 11 Jan 2018 16:44:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

2018-01-11 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9005


Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
..

IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries

If a scalar subquery is used with a binary predicate,
or, used in an arithmetic expression, it must return
only one row/column to be valid. If this cannot be
guaranteed at parse time through a single row aggregate
or limit clause, Impala fails the query like such.

E.g., currently the following query is not allowed:
SELECT bigint_col
FROM alltypesagg
WHERE id = (SELECT id FROM alltypesagg WHERE id = 1)

However, it would be allowed if the query contained
a LIMIT 1 clause, or instead of id it was max(id).

This commit makes the example valid by introducing a
runtime check to test if the subquery returns a single
row. If the subquery returns more than one row, it
aborts the query with an error.

I added a new node type, called ScalarCheckNode. It
is created during planning on top of the subquery when
needed, then during execution it checks if its child
only returns a single row.

I extended the frontend tests and e2e tests as well.

Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
A be/src/exec/scalar-check-node.cc
A be/src/exec/scalar-check-node.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
A fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
13 files changed, 387 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/1
--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy