[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 32: Nice work, Zoltan! This was a pretty intense task. Thanks for pulling through. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 32 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 Apr 2018 16:02:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 32: Yay, it's finally in, it was a long journey! :) Thanks to everyone who participated in the review. Special thanks to Alex for the tremendous amount of insights about the frontend! -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 32 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 30 Apr 2018 09:13:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Reviewed-on: http://gerrit.cloudera.org:8080/9005 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 1,130 insertions(+), 79 deletions(-) Approvals: Alex Behm: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 32 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 31: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 31 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Apr 2018 20:06:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 31: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2376/ -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 31 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Apr 2018 15:56:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (1 comment) Thanks! I rebased the patch set to contain IMPALA-6934. http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) { > Thanks! I think we should keep the tests added in this patch. OK, I kept the tests. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 27 Apr 2018 11:38:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#31). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 1,130 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/31 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 31 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) { > OK, I pushed the bugfix for review and added you as a reviewer: Thanks! I think we should keep the tests added in this patch. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 26 Apr 2018 18:07:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) { > It's not safe to rely on our tests to find all issues. We should convince o OK, I pushed the bugfix for review and added you as a reviewer: https://gerrit.cloudera.org/#/c/10218/ I expect that the bugfix will be merged soon, so this change can wait a little more :) I added some tests to IMPALA-6934, should I keep the current offset tests in this commit? They contain runtime scalar subqueries, but other than that they are quite similar to the tests in IMPALA-6934. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 26 Apr 2018 12:36:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 30: (1 comment) Change looks good to me. I'll +2 once we have decided how to proceed with my last comment :) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) subqueryStmt.setLimit(2); > I think it's surprising that QueryStmt.setLimit() wipes the offset (at leas It's not safe to rely on our tests to find all issues. We should convince ourselves that the modified setLimit() has the desired semantics for that other caller in L566. I investigated this and convinced myself that the new setLimit() has the correct semantics. In fact, your change fixes an existing, previously unknown bug. Let's separate the bugfix from this huge change. I filed IMPALA-6934 and assigned it to you. Please let me know if you cannot work on it As for this change, you could either wait for the bugfix to be merged, or you could undo your offset fix and comment out the offset tests noting that they will be enabled when IMPALA-6934 is fixed. Either approach works for me. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 30 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 25 Apr 2018 21:51:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 30: (3 comments) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) subqueryStmt.setLimit(2); > Sorry I pasted the wrong query here. This is the correct one: I think it's surprising that QueryStmt.setLimit() wipes the offset (at least it surprised me :) ). I modified it to keep the offset expr. QueryStmt.setLimit() is only used at two places (including this one) and based on the test results it seems I haven't introduced a regression by changing its behavior. Note that after calling setLimit() subqueryStmt will have an un-analyzed limit element. Maybe it's not a big issue, because we already had a setLimit() call in this method. Added new tests to 'PlannerTest/subquery-rewrite.test' and 'QueryTest/subquery.test'. http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@809 PS29, Line 809: } > Unsupported correlated subquery with runtime scalar check: Done http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@45 PS29, Line 45: Preconditions.checkState(child.getLimit() == 2); > Is this needed? Should we make this a Preconditions check? Based on the tests it seems we always invoke setLimit(2) in StmtRewriter.mergeExpr(), so I made it a precondition check as suggested. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 30 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 25 Apr 2018 13:32:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#30). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 1,134 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/30 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 30 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) { > I don't think this is right if the subquery already had a order by + limit Sorry I pasted the wrong query here. This is the correct one: select count(*) from functional.alltypes where 7 = (select id from functional.alltypestiny order by id limit 8 offset 7); The subquery returns a single "7" so the query should succeed, but currently it fails. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 24 Apr 2018 22:30:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (3 comments) Getting really close :) http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@484 PS29, Line 484: if (isRuntimeScalar) { I don't think this is right if the subquery already had a order by + limit + offset clause. For example, the following query should work but it currently fails because the plan is wrong (the offset is wiped): select count(*) from functional.alltypes where 7 = (select int_col from functional.alltypestiny order by id limit 2 offset 20); http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@809 PS29, Line 809: throw new AnalysisException("Subquery must return a single row: " + stmt.toSql()); Unsupported correlated subquery with runtime scalar check: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/29/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@45 PS29, Line 45: child.setLimit(2); Is this needed? Should we make this a Preconditions check? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 24 Apr 2018 22:27:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 29: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: "functional.alltypes)", "Comparison between subqueries is not supported " + > Sorry, I messed up that one critical sentence. Let me try again :) Thanks, it seems to work! :) I added new tests to 'PlannerTest/subquery-rewrite.test' for this case. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 29 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 24 Apr 2018 13:13:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#28). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 1,060 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/28 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 28 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Thanks for digging in, Zoltan. I investigated the predicate assignment issu Sorry, I messed up that one critical sentence. Let me try again :) That way, we will *not* migrate or propagate predicates into the inline view later during planning. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 Apr 2018 22:11:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (2 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > I see, we'd need a SubPlan for the correct behavior, right? Correct, we'd need a SubPlan to evaluate such plans. We currently don't support such subplans. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > It sounds like trying to modify the predicate assignment code is a pretty o Thanks for digging in, Zoltan. I investigated the predicate assignment issue and I think we can resolve it as follows: In StmtRewriter.mergeExpr() we need to set a "limit 2" on the subquery statement if it is runtime scalar. That way, we will push migrate or propagate predicates into the inline view later at subquery time. That change along will not work because the predicates will simply disappear from the plan due to a bug in CardinalityCheckNode. We need to be sure to not call assignConjuncts() in CardinalityCheckNode.init(). That way, we should get a SelectNode after the CardinalityCheckNode that contains our predicates at the correct place in the plan. Let me know if you discover any issues! -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 23 Apr 2018 22:08:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Maybe we should think again the case we discussed in PS9 SingleNodePlanner. It sounds like trying to modify the predicate assignment code is a pretty open-ended thing. We should probably just reject problematic queries for now and move towards getting the current code in. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 19 Apr 2018 23:22:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#26). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 936 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/26 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 26 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43 PS23, Line 43: protected String origSqlString_ = null; > Thinking about this again, you right that your previous code set the origSq Done http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > My understanding is that we're not tackling correlated subqueries in this p I see, we'd need a SubPlan for the correct behavior, right? I extended StmtRewriter.validateCorrelatedSubquery() so this query is not allowed anymore. I also added a test with the query from my previous comment. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Regardless of what optimizations we apply to make the query fast, we must r Maybe we should think again the case we discussed in PS9 SingleNodePlanner.java, because PostgreSQL also raises error for that query. That could be fixed by adding an extra check in InlineViewRef to break the value transfer graph. I don't have a strong opinion about it, I just felt I should mention it. Fixing this case is not as easy, at least I coudn't figure out how to do it yet. In SingleNodePlanner.createScanNode() we assign all the predicates to the scan node that are bound by the tuple id of the TableRef. I played with Analyzer.evalAfterJoin(Expr) and the boolean variable 'evalAfterJoin' in Analyzer.getBoundPredicates(), but couldn't prevent the predicate push down. I also experimented with SingleNodePlanner.migrateConjunctsToInlineView(), but at that point I can't prevent predicate propagation as far as I can tell. There is also a comment about it in createInlineViewPlan(). The problematic predicate is created in StmtRewriter.createJoinConjunct(). Tried to save its tuple ids into Analyzer.globalState_, but later in the planner I see different ids, so I couldn't really use the stored ids. Do you have any tips? I'd prefer to fix it here rather than opening a jira for it. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 18 Apr 2018 17:17:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43 PS23, Line 43: protected String origSqlString_ = null; > OK, moved it to QueryStmt. Thinking about this again, you right that your previous code set the origSqlString correctly for all statements. Still, I think it's clearer to have it only where it's actually needed. For most statements it's just equivalent to toSql() and the assignment in analysis context is confusing (and not tested). http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > No, we don't have such tests, and currently the following query is allowed: My understanding is that we're not tackling correlated subqueries in this patch. The plan generated for that query is currently wrong, so we should fail it during analysis/rewriting. Here's an example from Postgres that shows why the Impala plan generated by this patch is wrong: Query succeeds: create table a (c1 int, c2 int); create table b (c1 int, c2 int); insert into a values(1,1),(2,2); insert intl b values(1,1),(2,2); select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2); Query fails at runtime: create table a (c1 int, c2 int); create table b (c1 int, c2 int); insert into a values(1,1),(2,2); insert intl b values(1,1),(2,2)(2,2); select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2); ERROR: more than one row returned by a subquery used as an expression http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Wasn't sure where to put IS NULL, so I put it two different places. Regardless of what optimizations we apply to make the query fast, we must return correct results. I think the IS NULL variant is should not succeed. I confirmed the expected behavior with Postgres. Both IS NULL and IS NOT NULL variants return an error "Subquery returns more than one row". Let's either fix this issue, or forbid these cases and file a JIRA to separately fix them. This makes me wonder whether queries like the following return correct results: select straight_join count(1) from functional.alltypes a where (select int_col from functional.alltypes b) > 10; Basically, when the non-subquery child is a constant we push the predicate down which I think is wrong. Same deal. We either need to forbid these cases and fix in a separate JIRA or fix them now. Likewise, Postgres returns the same "Subquery returns more than one row" for that query. I think the main problem is that we push the predicate "through a limit" which is generally not correct. We might be able to address that in the Planner when migrating conjuncts into inline views. Perhaps it's enough to not migrate conjuncts if the inline view's stmt is runtime scalar. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 13 Apr 2018 16:32:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#25). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 927 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/25 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 25 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#24). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 922 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/24 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 24 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (10 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@425 PS23, Line 425: analysisResult_.stmt_.origSqlString_ = analysisResult_.stmt_.toSql(); > This is pretty weird. See my comment in StatementBase. Removed it from here. Now I'm setting origSqlString_ in SelectStmt.analyze() and UnionStmt.analyze(). http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43 PS23, Line 43: protected String origSqlString_ = null; > Why is this here and not in QueryStmt? I think it's cleaner to have it in Q OK, moved it to QueryStmt. So when I set it in AnalysisContext.analyze() after the first analysis phase (before any rewrites), wasn't it correctly set for all the statements? http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (isCorrelated && !item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) { > I looked into this and concluded that we should convert the type check to a Turned it into a precondition. To fix the error message I changed the condition in L1032 and introduced a new predicate called IS_UDA_FN. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@747 PS23, Line 747: cardinalityCheckNode.computeStats(ctx_.getRootAnalyzer()); > Why do we need to computeStats() again? I introduced it in PS2 as some kind of workaround, but it seems we no longer need it. Removed this line. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1163 PS23, Line 1163: // Scalar subquery check is done at runtime, not at parse-time > not at parse-time -> not during analysis Done http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1164 PS23, Line 1164: AnalyzesOk("select id from functional.alltypestiny t where int_col = " + > Loop over cmpOperators and have a few different variants inside the loop Created a loop. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > Do we have a negative test that shows disallowing correlated scalar subquer No, we don't have such tests, and currently the following query is allowed: select * from functional.alltypes t1 where id < (select id from functional.alltypes t2 where t1.int_col = t2.int_col); However, it raises the cardinality error, but I don't think we handle correlated subqueries correctly all the time. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Also add a variant with (subquery) IS NULL. That variant is currently forbi Wasn't sure where to put IS NULL, so I put it two different places. In the meantime I had to modify IsNullPredicate.java, and noticed a weird behavior: select count(1) from functional.alltypes where (select int_col from functional.alltypes) is null The above query executes succesfully even though the subquery clearly returns multiple rows. But when I use IS NOT NULL: select count(1) from functional.alltypes where (select int_col from functional.alltypes) is not null Then the query fails with the expected error "Subquery must not return more than one row". And when I refer to a column that is completely NULL, the results are the exact opposite, ie. IS NULL yields error and IS NOT NULL executes successfully. I guess the explanation is that the IS NULL and IS
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (!subquery.getType().isScalarType()) { > Right. That could be an indication that this check is useless or that we ha I looked into this and concluded that we should convert the type check to a Precondition. However, I also found some weirdness in our error checking, maybe you can fix that also. The error message for the following query doesn't quite make sense: select * from functional.alltypes t1 where id < (select id from functional.alltypes t2 where t1.int_col = t2.int_col) UDAs are not supported in the select list of correlated subqueries: (SELECT id FROM functional.alltypes t2 WHERE t1.int_col = t2.int_col) -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 21 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Apr 2018 22:15:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (3 comments) Code looks pretty good now. I'll do another pass over everything focusing on the tests. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (isCorrelated && !item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) { > This allows runtime scalars here, because their type is scalar now and not Right. That could be an indication that this check is useless or that we have a gap in test coverage. I'm worried about the latter. One of two outcomes is acceptable: * add a test that hits this branch * convince ourselves that the type is always scalar here, and then add a Preconditions check http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@82 PS23, Line 82: if (node instanceof StatementBase) { No need for this. We're always going to have a StatementBase http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917 PS23, Line 917: //TODO: make it work again or delete : //testToSql("select * from functional.alltypes where not (id < 10 and " + : //"(int_col in (select int_col from functional.alltypestiny)) and " + : //"(string_col = (select max(string_col) from functional.alltypestiny)))", : //"SELECT * FROM functional.alltypes WHERE NOT (id < 10 AND " + : //"(int_col IN (SELECT int_col FROM functional.alltypestiny)) AND " + : //"(string_col = (SELECT max(string_col) FROM functional.alltypestiny)))"); > So, is it OK if I just leave this TODO here? Please file an improvement JIRA for this case, and remove the test. The issue here is that we push the NOT into the condition, so the ANDs turn into ORs and then we no longer support the subquery. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Apr 2018 21:44:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 23: (6 comments) http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java@1193 PS21, Line 1193: Subquery subq = (Subquery) this; > I think we can remove this when taking care of the other comments. Done http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1092 PS21, Line 1092: Preconditions.checkState(isAnalyzed()); > Preconditions.checkState(isAnalyzed()); Done http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@427 PS21, Line 427: ExprSubstitutionMap smap = new ExprSubstitutionMap(); > I don't think we should have to do this. At the very least we should have a Removed these lines and updated ExistsPredicate as you recommended. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@464 PS21, Line 464: // to the stmt's table refs later. Explicitly set the inline view's column labels > Let's have: Done http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (isCorrelated && !item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) { > This has a slightly different meaning than the original code. This allows runtime scalars here, because their type is scalar now and not array type. Maybe it is worth to check that the subquery doesn't return multiple columns. Anyway, I removed this if statement and it seems that the FE tests don't miss it. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917 PS23, Line 917: //TODO: make it work again or delete : //testToSql("select * from functional.alltypes where not (id < 10 and " + : //"(int_col in (select int_col from functional.alltypestiny)) and " + : //"(string_col = (select max(string_col) from functional.alltypestiny)))", : //"SELECT * FROM functional.alltypes WHERE NOT (id < 10 AND " + : //"(int_col IN (SELECT int_col FROM functional.alltypestiny)) AND " + : //"(string_col = (SELECT max(string_col) FROM functional.alltypestiny)))"); So, is it OK if I just leave this TODO here? Or, should I remove this part? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 11 Apr 2018 15:32:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#22). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 911 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/22 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 22 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 21: (6 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: if (!operand.type_.isNumericType() && !operand.type_.isNull()) { > Yeah, now I vaguely remember from those formal languages courses that the t That's true. I see the isRuntimeScalar flag more like a hint on how we should evaluate this expression, rather than part of type analysis. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/Expr.java@1193 PS21, Line 1193: // Callers should be sure to analyze the limit clause of stmt I think we can remove this when taking care of the other comments. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1092 PS21, Line 1092: // limit 1 clause Preconditions.checkState(isAnalyzed()); http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@427 PS21, Line 427: newSubquery.getStatement().setIsRuntimeScalar( I don't think we should have to do this. At the very least we should have a comment explaining why this is necessary. I looked into this issue and concluded that you probably added this code to work around a bug with expr substitution in ExistsPredicate. The bug is that in ExistsPredicate we keep a member "subquery_", but that member is not properly updated during expr substitution. The easiest fix is to remove the "subquery_" member in ExistsPredicate and consistently use "children_.get(0)" wherever we used "subquery_" before. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@464 PS21, Line 464: SelectStmt subqueryStmt = (SelectStmt)expr.getSubquery().getStatement(); Let's have: boolean isScalarSubquery = expr.getSubquery().isScalarSubquery(); and use that variable consistently in this function. It's weird to call isScalarSubquery() once we start modifying the subquery statement. It's easy to avoid that. Might also be convenient/consistent to have a isRuntimeScalar boolean variable here. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (!subquery.getType().isScalarType()) { This has a slightly different meaning than the original code. I could not find a test that covers this branch. Does this check even make sense? Why would we not be able to allow runtime scalar here? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 21 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 10 Apr 2018 22:22:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 21: (3 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: if (!operand.type_.isNumericType() && !operand.type_.isNull()) { > Sorry confusing sentence. I mean: Yeah, now I vaguely remember from those formal languages courses that the type deduction should only go bottom-up. :) (However, we still set isRuntimeScalar based on the context) There are some places where I have to use the old semantics of isScalarSubquery(), ie. to only test that the type of a subquery is scalar and not test whether it returns multiple rows. http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java File fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java@66 PS19, Line 66: @Override > We really should't be setting the type of an expression from the outside. T Removed this line. http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@a614 PS21, Line 614: This wasn't needed for this JIRA, but looked weird, so I simplified it. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 21 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 05 Apr 2018 12:45:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#21). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 914 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/21 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 21 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); > I don't think the type of an expression should not depend on its enclosing Sorry confusing sentence. I mean: The type of an expression should *not* depend on its enclosing context. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Mar 2018 22:03:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 19: (2 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: if (!operand.type_.isNumericType() && !operand.type_.isNull()) { > Thanks! I don't think the type of an expression should not depend on its enclosing context. That breaks a fundamental invariant and may cause all sorts of problems. If the alternative is to always analyze the limit clause in Expr.isScalarSubquery(), then I strongly prefer that variant, even though it's not perfectly clean either. http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java File fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/19/fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java@66 PS19, Line 66: if(!queryStmt.returnsSingleRow()) subquery_.type_ = new ArrayType(subquery_.type_); We really should't be setting the type of an expression from the outside. That's the job of expression analysis itself. One alternative could be to keep a the ArrayType or a flag "isScalarSubquery" internally in Subquery and then use that flag or type to properly respond to Expr.isScalarSubquery(). -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 19 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 29 Mar 2018 21:44:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#19). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 27 files changed, 910 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/19 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 19 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#17). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 27 files changed, 912 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/17 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 17 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); > Thanks for making these changes. I did not look at all failing tests, but I Thanks! At some point I was suspecting that some code parts might rely on the exact type of a subquery. So I made the type of the subquery ArrayType in InPredicate and ExistsPredicate. But, it didn't seem to solve the problem, so I removed the ArrayTypes. Now that you mentioned it, I figured out that I forgot to "transfer" the exact type of the subquery to the new subquery in StmtRewriter.rewriteExpr(). I also tried your version of Expr.isScalarSubquery(), but I got a bunch of IllegalStateExceptions because the limit element wasn't analyzed (you already wrote a comment about it, but still I couldn't always make it analyzed, and my other approach already worked). -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 27 Mar 2018 00:02:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); > I like this suggestion, it does make things much easier and simple. Thanks for making these changes. I did not look at all failing tests, but I did into the following one: // Correlated predicates in the subquery's ON clause. Vary join column type. AnalyzesOk("select * from functional.alltypes t where t.id IN " + "(select a.id from functional.alltypesagg a left outer join " + "functional.alltypessmall s on s.int_col = t.int_col)"); The problem with this one is that the StmtRewriter relies on Subquery.isScalarSubquery(). The existing isScalarSubquery() relies on the type of the Subquery to determine whether a Subquery can be proven to be scalar at compile time. Since we have changed how the type is determined, we broke the isScalarSubquery() function. I hacked up the following to fix it: public boolean isScalarSubquery() { Preconditions.checkState(isAnalyzed_); if (!(this instanceof Subquery)) return false; Subquery subq = (Subquery) this; // Callers should be sure to analyze the limit clause of stmt SelectStmt stmt = (SelectStmt) subq.getStatement(); return stmt.returnsSingleRow() && getType().isScalarType(); } Of course, there are other ways to fix this, just wanted to point you in the right direction. Let me know if this needs further discussion. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 22 Mar 2018 23:08:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#16). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 26 files changed, 897 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/16 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 15: (7 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc@90 PS13, Line 90: > We should technically also set num_rows_returned_ (it's used in a couple of Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: if (!operand.type_.isNumericType() && !operand.type_.isNull()) { > Good question, thanks for the detailed explanation. The following alternati I like this suggestion, it does make things much easier and simple. However, I'm getting a few strange errors that I couldn't really fix. Some say "select list expression not produced by aggregation output", others complain about the memory estimation, but I don't really see the connection to my changes. I've ran the frontend tests and the backend tests that I touched. Frontend tests produce 8 failures and 3 errors, tests/query_test/test_nested_types.py::TestNestedTypes::test_subplan also fails. http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@32 PS13, Line 32: m > extra space Done http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116 PS9, Line 1116: // place. > Looking at the example query again, I think it's ok that it succeeds. We're I removed the quick fix from InlineViewRef.analyze() http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@1434 PS11, Line 1434: > Thanks! Works for me. Done http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test: http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709 PS13, Line 709: Subquery must not return more than one row: SELECT o_orderkey FROM c.c_orders WHERE o_orderkey = 600 OR o_orderkey = 4285920 > Is it possible to include the query test in the error message so we can be Done http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test@841 PS13, Line 841: functional > Let's drop the functional. prefix, it's not required - the idea is that we I get "Could not resolve table reference: ..." errors if I do that. The other queries in this file are also prefixed with the database. Should I migrate the queries to a different file where the test matrix is used? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 21 Mar 2018 23:08:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#15). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 24 files changed, 726 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/15 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#14). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 24 files changed, 727 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/14 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 14 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 13: (4 comments) Backend changes look good to me still. http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/13/be/src/exec/cardinality-check-node.cc@90 PS13, Line 90: COUNTER_SET(rows_returned_counter_, 1); We should technically also set num_rows_returned_ (it's used in a couple of places, like be/src/exec/partitioned-aggregation-node.cc). http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/13/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@32 PS13, Line 32: extra space http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test: http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709 PS13, Line 709: Subquery must not return more than one row: Is it possible to include the query test in the error message so we can be sure the error message is correct. http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/13/testdata/workloads/functional-query/queries/QueryTest/subquery.test@841 PS13, Line 841: functional Let's drop the functional. prefix, it's not required - the idea is that we can run against different file formats in the test matrix automatically. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Mar 2018 22:19:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 13: PS13 was only a rebase. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 09 Mar 2018 17:38:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#13). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 958 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/13 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116 PS9, Line 1116: if (inlineViewRef.getViewStmt().isRuntimeScalar()) { > The optimization is definitely intentional and it's not clear to me why thi Looking at the example query again, I think it's ok that it succeeds. We're just deducing information from the query. I'm in favor of removing your quick fix to InlineViewRef.analyze() and accept that we can infer that the query is semantically correct as a whole. We should definitely add this case query as a test though and explain the behavior. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 22 Feb 2018 08:05:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#12). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/HdfsCachingOp.java M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StatementBase.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 28 files changed, 958 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/12 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 11: (24 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27 PS11, Line 27: /// Node that passes along the row pulled from its child unchanged. > First sentence should crisply state purpose of this node, e.g.: Thanks, I am using your sentences :) http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29 PS11, Line 29: /// Note that this node must be a blocking node! > Please give an explanation why, e.g.: Thanks again, I don't think I could formulate it so nicely! :) http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29 PS9, Line 29: /// Note that this node must be a blocking node! > Thanks. To my defense in PS1 it was called ScalarCheckNode and it only allowed one row :) Later when I renamed it I felt that some extra flexibility suits the new name. But I see that it is unlikely that we'll ever need this flexibility. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38 PS9, Line 38: virtual Status Reset(RuntimeState* state) override; > Nice experiments and tests! Seems really hard to hit this case with a real Thanks! Done. http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68 PS11, Line 68: DCHECK_LE(rows_collected, 1); > DCHECK that rows_collected is either 0 or 1. The current check accepts nega Done http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85 PS11, Line 85: row_batch_->DeepCopyTo(output_row_batch); > Why do we deep copy the row twice? Once in Open() should be sufficient. Done http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); > We should still keep the analysis and rewrite phases distinctly separate as I'm a bit confused about how to organise this. In AnalysisContext.analyze() I see the following phases: * analyze * rewrite exprs * rewrite stmts * re-analyze if needed If I don't do anything here, this code will throw an exception that needs to be handled somewhere, but the real problem is that the first analyze phase will remain unfinished. Then, we would start applying rewrites on the partially analyzed tree. So, some modifications are needed to analyzeImpl() I think. I tried to simply return from it when we encounter a non-scalar subquery. Then, do the rewrite in StmtRewriter, but analyze() is also called during statement rewriting. This is problematic because the AnalysisExceptions are wrapped in an IllegalStateException and that causes some tests to fail. This happens when the subquery is still invalid after making it scalar, e.g.: select id from functional.alltypestiny where int_col = (select max(timestamp_col) from functional.alltypessmall) ERROR: AnalysisException: null CAUSED BY: IllegalStateException: Failed analysis after expr substitution. CAUSED BY: AnalysisException: operands of type INT and TIMESTAMP are not comparable: int_col = (SELECT max(timestamp_col) FROM functional.alltypessmall) My subquery "rewrite" only changes the type of the subquery. Aren't these kind of type deductions belong to the analysis phase? Maybe I should move rewriteNonScalarSubqueries() to Expr and rename it to something like makeContainedSubquerySalar() ? Am I missing something? Do you have something in mind how should I deal with it? http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177 PS11, Line 177: for (Expr expr: children_) { > Move to the rewrite phase. see comment in ArithmeticExpr.java http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 11: (24 comments) http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27 PS11, Line 27: /// Node that passes along the row pulled from its child unchanged. First sentence should crisply state purpose of this node, e.g.: Node that returns an error if its child produces more than a single row. If successful, this node returns a deep copy of its single input row. http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29 PS11, Line 29: /// Note that this node must be a blocking node! Please give an explanation why, e.g.: Note that this node must be a blocking node. It would be incorrect to return rows before the single row constraint has been validated because downstream exec nodes might produce results and incorrectly return them to the client. If the child of this node produces more than one row it means the SQL query is semantically invalid, so no rows must be returned to the client. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29 PS9, Line 29: /// Note that this node must be a blocking node! > No, I just wanted make it more generic. Done. Thanks. For reference, I think the "Speculative Generality" code smell applies here. https://cwiki.apache.org/confluence/display/IMPALA/Effective+Coding+Practices http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38 PS9, Line 38: virtual Status Reset(RuntimeState* state) override; > Thanks for the tips. However, I still couldn't succeed. I used the followin Nice experiments and tests! Seems really hard to hit this case with a real non-subquery plan. I think documenting the blocking nature of the node in code comments should be enough. http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68 PS11, Line 68: DCHECK_LE(rows_collected, 1); DCHECK that rows_collected is either 0 or 1. The current check accepts negative values. http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85 PS11, Line 85: row_batch_->DeepCopyTo(output_row_batch); Why do we deep copy the row twice? Once in Open() should be sufficient. Copying twice could be expensive in subplans. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175 PS11, Line 175: StmtRewriter.rewriteNonScalarSubqueries(operand, analyzer); We should still keep the analysis and rewrite phases distinctly separate as before. Calling ExprRewriter within analysis breaks the separation and is very confusing. The StmtRewriter should treat all Exprs in the same way, so no special code should be required for individual Expr subclasses. The rewrite should be performed within the rewrite phase when StmtRewriter.rewrite() is called. See AnalaysisContext.analyze() for an overview of the phases analysis and rewrite phases. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177 PS11, Line 177: for (Expr expr: children_) { Move to the rewrite phase. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@529 PS11, Line 529: for (Expr child : getChildren()) { Move to rewrite phase. http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94 PS11, Line 94: // This member contains the original statement provided by the user. Not accurate. This contains the post-analysis toSql() before rewrites. It may still be different from the SQL originally provided by the user (table
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: (22 comments) Thanks for the comments! I could resolve most of them, but still have some questions. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@25 PS9, Line 25: namespace impala { > Side note: I could hot a DCHECK with this query: This query raises the cardinality error. I also added this query to QueryTest/subquery.test http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29 PS9, Line 29: /// Currently it is only used to check <= 1 cardinality of the child. > I cannot think of a scenario where we would need to check for a cardinality No, I just wanted make it more generic. Done. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38 PS9, Line 38: virtual Status Open(RuntimeState* state) override; > Thanks for your attempt and details. The following ideas might help: Thanks for the tips. However, I still couldn't succeed. I used the following query: select 1 from (values(1 a)) v where a = (select id from functional.alltypestiny) limit 1 Plan is: PLAN-ROOT SINK 03:HASH JOIN [RIGHT SEMI JOIN] |--00:UNION | constant-operands=1 02:CARDINALITY CHECK 01:SCAN HDFS [functional.alltypestiny] But, PartitionedHashJoinNode calls GetNext() twice on its child. First, it calls in Open() -> BlockingJoinNode::GetFirstProbeRow(state) -> CardinalityCheckNode::GetNext(). Then, in PartitionedHashJoinNode::GetNext() it starts to process the rows in the current rowbatch (fetched by Open()). After that it calls NextProbeRowBatch(), which calls GetNext() on CardinalityCheckNode, and this time it raises the error. So as far as I can tell the followings have to go wrong to cause the bug: - change the behavior of PartitionedHashJoinNode - make CardinalityCheckNode a streaming node despite of the comments that warn you not to do that - batch size happens to be zero Anyway, I added an e2e to hs2/test_fetch.py, although I couldn't make it fail. However, if CardinalityCheckNode is in a subplan, the following query succeeds (if batch_size=1, with default batch size it is flaky): select c_custkey from customer c where c_custkey < (select o_orderkey from c.c_orders where o_orderkey in (4285920, 600, 500))) limit 1; o_orderkey 4285920 and 600 belong to the same customer, while o_orderkey 500 belongs to a different customer. Without LIMIT 1 it fails. But, since the subquery refers to an identifier from the outer query I guess it counts as a correlated subquery, while this JIRA is only about uncorrelated subqueries. Anyway, it might be a good test for the uncorrelated subquery JIRA. http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@58 PS9, Line 58: // The associated SQL statement > for error reporting Done http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc@37 PS9, Line 37: DCHECK_EQ(limit_, max_cardinality_); > Add comment explaining why we are not calling ExecNode::Prepare() we are calling it in L40. http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@189 PS9, Line 189: subquery.getStatement().setIsRuntimeScalar(true); > I don't think this approach is general enough to handle cases like the foll Extended StmtRewriter with a new static method called rewriteNonScalarSubqueries(). It takes an expr, and tries to rewrite the contained subqueries' type to scalar. Now these queries fail at runtime. They succeed if we add a WHERE clause with 'id = 1'. I added tests with these kind of queries. Do you have any other corner case in mind? http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94 PS9, Line 94: public boolean isRuntimeScalar() { return isRuntimeScalar_; } > please move these getters and setters to L365 to not clutter the members li Done http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@99 PS9, Line 99: protected String origStmtStr_ = null; > *
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#10). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test M tests/hs2/test_fetch.py 23 files changed, 1,076 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/10 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@25 PS9, Line 25: namespace impala { Side note: I could hot a DCHECK with this query: select a from (values(1 a),(2),(3)) v where a = (select x from (values(1 x),(2),(3)) v); http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38 PS9, Line 38: virtual Status Open(RuntimeState* state) override; > I tried to craft a test scenario, but couldn't really succeed with it. Thanks for your attempt and details. The following ideas might help: 1. Use num_nodes=1 to keep things simple and avoid exchanges 2. Use batch_size=1 to return batches with one row at a time 3. Use limit 1 at the top of the query so it succeeds once the first row is seen 4. You need to get the subquery on the probe side of the join. I got the join to be inverted with the following query: select 1 from (values(1 a)) v where a = (select bigint_col from functional.alltypes limit 1); -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Jan 2018 21:17:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: Saw this, will take a look. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 19:02:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 17:40:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#9). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 17 files changed, 787 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/9 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 8: Hi Alex, Dimitris, Could you review this change, especially the FE part? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 14:16:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#8). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 17 files changed, 787 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/8 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89 PS6, Line 89: CopyRows > Yeah, I think we can just assume that the frontend will only generate plans I added comments about it to this class, and to the corresponding plan node class as well. http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116 PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) { > I don't think we have test coverage for this code yet, since it's only exec I added some tests to nested-types-subplan.test. They use tpch_nested_parquet.customer because functional.allcomplextypes is empty. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 17 Jan 2018 14:13:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 7: (2 comments) The backend code is looking good to me, comments are mainly about test coverage. I'm not confident about the frontend code, I'd like input from Alex or Dimitris about whether the approach is right. http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89 PS6, Line 89: CopyRows > No, in this case RowBatch::DeepCopyTo fails. Yeah, I think we can just assume that the frontend will only generate plans with max_cardinality <= 1 for now and deal with additional complications later if we need the additional generality. We can't test the other cases now so seems best to be clear that it doesn't work. http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/7/be/src/exec/cardinality-check-node.cc@116 PS7, Line 116: Status CardinalityCheckNode::Reset(RuntimeState* state) { I don't think we have test coverage for this code yet, since it's only executed if the node is inside a subplan. We do support this, e.g. the following query puts a cardinality check inside a subplan: select a.id from functional.allcomplextypes a where id < (select item from a.int_array_col); It looks like these planner tests and query tests have a number of subqueries that get put inside subplans: ./testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test ./testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test Your code does look like it handles subplans correctly, just want to make sure we have the test coverage. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Jan 2018 21:54:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#7). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 16 files changed, 739 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/7 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@44 PS6, Line 44: single row from child > generalize the "single" Done http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@45 PS6, Line 45: scalar > perhaps "child" is a better name? renamed it to simply 'row_batch_' because during copying the name 'child_row_batch_' was confusing. http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@51 PS6, Line 51: more than one row. > should not be specific to "one" woops, thanks http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89 PS6, Line 89: *eos = true; > if max_cardinality_ > number of tuples in a batch, does this work? No, in this case RowBatch::DeepCopyTo fails. This problem is also present in Open, however in Open we create the RowBatch and we can set the capacity to max_cardinality. Maybe we can assume that max_cardinality is not too huge. If we cannot assume that, then I think we cannot implement it as a blocking node. In GetNext, we get an already created output_row_batch and we can't change its capacity. So I changed the behavior to something similar to SelectNode::GetNext(). -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Jan 2018 18:09:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@44 PS6, Line 44: single row from child generalize the "single" http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.h@45 PS6, Line 45: scalar perhaps "child" is a better name? http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@51 PS6, Line 51: more than one row. should not be specific to "one" http://gerrit.cloudera.org:8080/#/c/9005/6/be/src/exec/cardinality-check-node.cc@89 PS6, Line 89: *eos = true; if max_cardinality_ > number of tuples in a batch, does this work? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Jan 2018 16:58:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#6). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 16 files changed, 710 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/6 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#5). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 16 files changed, 709 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/5 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h@27 PS4, Line 27: Node that passes along the row pulled from its child unchanged. : /// If there is more than one row, it aborts the query. > sync this with the changes: updated the comment http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@61 PS4, Line 61: > 1 > shouldn't this be max_cardinality_? oops, yes it should, thanks http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@65 PS4, Line 65: h.num_rows() == 1 > this is another check that seems to have been carried from the stricter <= It is a bug, carried from the earlier implementation. I changed this check, now this BE exec node is aligned with the corresponding FE plan node. http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@69 PS4, Line 69: ted, 1); > same here Done http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@85 PS4, Line 85: == 1 > and here Done -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Jan 2018 13:35:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h File be/src/exec/cardinality-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.h@27 PS4, Line 27: Node that passes along the row pulled from its child unchanged. : /// If there is more than one row, it aborts the query. sync this with the changes: - <= max_cardinality_ is now permitted (not necessarily 1) - more than one row may be returned, so generalize to ".. passes along rows ..." http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc File be/src/exec/cardinality-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@61 PS4, Line 61: > 1 shouldn't this be max_cardinality_? http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@65 PS4, Line 65: h.num_rows() == 1 this is another check that seems to have been carried from the stricter <= 1 implementation. if there are no tests for > 1 limit, then perhaps this should be paired back down to <= 1 ? Speaking of which, if its not possible to create a plan for the node from the FE, is the general approach to pair back the functionality? otherwise, I'd expect tests independent of the FE. http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@69 PS4, Line 69: ted, 1); same here http://gerrit.cloudera.org:8080/#/c/9005/4/be/src/exec/cardinality-check-node.cc@85 PS4, Line 85: == 1 and here -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Jan 2018 00:37:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#4). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called CardinalityCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt A be/src/exec/cardinality-check-node.cc A be/src/exec/cardinality-check-node.h M be/src/exec/exec-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java A fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-query/queries/QueryTest/subquery.test 15 files changed, 689 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/4 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 3: (10 comments) Thanks all! http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h@29 PS3, Line 29: ScalarCheckNode > small naming nit, but it seems that this class checks that the cardinality I agree that it's a better name, so renamed it to CardinalityCheckNode. Also, to make it more general, it can be parameterized by the max allowed cardinality. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@33 PS3, Line 33: Status ScalarCheckNode::Prepare(RuntimeState* state) { > Can you DCHECK that this doesn't have any conjuncts assigned to it. Those a I added a DCHECK for conjuncts. The limit of this node is set by the constructor of the plan node and it equals to the allowed cardinality. I added a DCHECK_EQ here to check if limit and max_cardinality is the same. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@56 PS3, Line 56: if (rows_collected > 1) { > Does conditional fit on one line? Now that we want to display the associated SQL statement as well, it doesn't. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57 PS3, Line 57: scalar > what guarantees a scalar here? At runtime nothing, it is checked at parse-time. However, as you suggested I renamed it to CardinalityCheckNode. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57 PS3, Line 57: scalar > It would be a lot more user friendly if we could report which scalar subque The subquery gets rewritten at one point of the analysis. E.g. it rewrites "id = 1 or id = 2" to "id IN (1, 2)" Therefore I needed to hack a little to plumb the original SQL statement. I wonder if there's a better or usual way to do that. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@59 PS3, Line 59: if (child_batch.num_rows() == 1) { > Does conditional fit on one line? yes http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@177 PS3, Line 177: if (operand instanceof Subquery && !operand.type_.isScalarType()) { > Handling this case here doesn't feel quite right, since it's kind-of an err I restructured the branches http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@740 PS3, Line 740: // Scalar checking requires to merge the output of the child fragment > Maybe this would be better explained as "The scalar check must execute on a yeah, it does sound better. http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@26 PS3, Line 26: conjuncts and a limit clause. > what conjuncts and what limit? perhaps phrase this to be more specific. Oops, it was a leftover from copy pasting SelectNode. http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839 PS3, Line 839: QUERY > We should also add planner tests to test the explain plan output. E.g. see Added some. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 15 Jan 2018 17:20:12 + Gerrit-HasComments:
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@33 PS3, Line 33: Status ScalarCheckNode::Prepare(RuntimeState* state) { Can you DCHECK that this doesn't have any conjuncts assigned to it. Those aren't implemented by this node so we'd get incorrect results in that case. We should actually convince ourselves that it's not possible to assign conjuncts to this plan node (I honestly don't know if it is - might be a question for Alex). I think the same thing applies for a limit of 0 - I'm not sure if we can get a limit applied. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@56 PS3, Line 56: if (rows_collected > 1) { Does conditional fit on one line? http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57 PS3, Line 57: scalar > what guarantees a scalar here? It would be a lot more user friendly if we could report which scalar subquery failed. It might be feasible to plumb information (the original SQL text? a line number?) through from analysis so it can be reported here. Actually, if we just plumbed the string through from analysis, we could make this a generic node like Vuk suggested that doesn't know anything about the SQl semantics it's implementing. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@59 PS3, Line 59: if (child_batch.num_rows() == 1) { Does conditional fit on one line? http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@177 PS3, Line 177: if (operand instanceof Subquery && !operand.type_.isScalarType()) { Handling this case here doesn't feel quite right, since it's kind-of an error-handling path. It seems like this part should be a separate branch from the error handling that starts in l175. http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@740 PS3, Line 740: // Scalar checking requires to merge the output of the child fragment Maybe this would be better explained as "The scalar check must execute on a single node". http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/3/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839 PS3, Line 839: QUERY We should also add planner tests to test the explain plan output. E.g. see testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 12 Jan 2018 23:38:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 3: (3 comments) couple of small clarifying comments. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.h@29 PS3, Line 29: ScalarCheckNode small naming nit, but it seems that this class checks that the cardinality of the child is <= 1. It does not guarantee that the child produces a scalar (could be a multi-column row). perhaps CardinalityCheckNode? true that the frontend should guarantee arity here, but perhaps it should be checked/asserted or the name should match the more general functionality. http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/3/be/src/exec/scalar-check-node.cc@57 PS3, Line 57: scalar what guarantees a scalar here? http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java: http://gerrit.cloudera.org:8080/#/c/9005/3/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java@26 PS3, Line 26: conjuncts and a limit clause. what conjuncts and what limit? perhaps phrase this to be more specific. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 12 Jan 2018 18:49:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (15 comments) Thanks for all the comments! http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@44 PS1, Line 44: boost::scoped_ptr child_row_batch_; > If I understand correctly, having child_row_batch_ as member is usually nee You are right, but I still keep it because I've changed the implementation and now we need to have a member. I renamed it to scalar_row_batch_ btw. ScalarCheckNode is now a blocking node and deep-copies its child's row in Open(), then passes it to its parent in GetNext(). http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47 PS1, Line 47: child_row_idx_ > This member is not used anywhere. Probably it should be removed. Done http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31 PS1, Line 31: NULL > Prefer nullptr in new code. Removed it from here since it is a boost::scoped_ptr with a default constructor. But maybe I can add "= nullptr" in the class definition to make the initialization explicit. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32 PS1, Line 32: child_row_idx_(0), > We've mostly been switching to initialising member variables to constants i Removed those members. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56 PS1, Line 56: if (child_row_batch_->num_rows() == 0) { > I think this needs to be a while loop that resets the batch after each iter I followed this logic and moved this part to Open() since now it is a blocking node and we want to call Close() on the child asap. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@63 PS1, Line 63: one > "zero or one" or "maximum one" would be more exact Done http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78 PS1, Line 78: child_row_batch_->TransferResourceOwnership(row_batch); > I think it's the start of a good idea. I originally suggested modelling thi Thanks for the idea, I followed this logic. So I deep copy the child's row in Open(), or raise an error if there are more rows. Also, I close the child right after reading its row. Then, deep copy the single scalar row in GetNext() into the parent's row batch. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@178 PS1, Line 178: // Runtime check needed to check if subquery returns scalar > This is not just a check, but can be a transformation, as array types are r Replaced the comment to make it more informative. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@179 PS1, Line 179: // Runtime check needed to check if subquery returns scalar > This is not just a check, but can be a transformation, as array types are r Replaced the comment. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@90 PS1, Line 90: // If true, a ScalarCheckNode is needed to check this statement's result at runtime > nit: missing . Done http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735 PS1, Line 735:* @throws ImpalaException > We don't generally have @throws declarations in Java comments Oops, eclipse added it automagically. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746 PS1, Line 746: selectNode > scalarCheckNode? Yes! http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java File fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java:
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Hello Attila Jeges, Tim Armstrong, Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9005 to look at the new patch set (#2). Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called ScalarCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/scalar-check-node.cc A be/src/exec/scalar-check-node.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java A fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-query/queries/QueryTest/subquery.test 13 files changed, 448 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/2 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h File be/src/exec/scalar-check-node.h: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.h@47 PS1, Line 47: child_row_idx_ This member is not used anywhere. Probably it should be removed. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 17:40:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@932 PS1, Line 932: The case when the subquery returns 0 row could be tested too. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:45:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 ) Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746 PS1, Line 746: selectNode scalarCheckNode? -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 11 Jan 2018 16:44:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9005 Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries .. IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries If a scalar subquery is used with a binary predicate, or, used in an arithmetic expression, it must return only one row/column to be valid. If this cannot be guaranteed at parse time through a single row aggregate or limit clause, Impala fails the query like such. E.g., currently the following query is not allowed: SELECT bigint_col FROM alltypesagg WHERE id = (SELECT id FROM alltypesagg WHERE id = 1) However, it would be allowed if the query contained a LIMIT 1 clause, or instead of id it was max(id). This commit makes the example valid by introducing a runtime check to test if the subquery returns a single row. If the subquery returns more than one row, it aborts the query with an error. I added a new node type, called ScalarCheckNode. It is created during planning on top of the subquery when needed, then during execution it checks if its child only returns a single row. I extended the frontend tests and e2e tests as well. Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/scalar-check-node.cc A be/src/exec/scalar-check-node.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java A fe/src/main/java/org/apache/impala/planner/ScalarCheckNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M testdata/workloads/functional-query/queries/QueryTest/subquery.test 13 files changed, 387 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/9005/1 -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy