[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13418 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65609442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -123,6 +123,31 @@ class SubquerySuite extends QueryTest with SharedSQLContext { ) } + test("SPARK-15677: Scalar sub-query in Select list against a DataFrame generated query") { +Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1") --- End diff -- please use `withTempTable`(ok the name is wrong for history reasons, it should be `withTempView`), which will drop the view after the test for you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65609083 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -123,6 +123,31 @@ class SubquerySuite extends QueryTest with SharedSQLContext { ) } + test("SPARK-15677: Scalar sub-query in Select list against a DataFrame generated query") { --- End diff -- maybe we should mention that this bug only exists in local relation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65594703 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -121,6 +123,16 @@ class SubquerySuite extends QueryTest with SharedSQLContext { " where key = (select max(key) from subqueryData) - 1)"), Array(Row("two")) ) + +checkAnswer( --- End diff -- I think it's better to create a new test case for it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65484867 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,10 +1468,15 @@ object DecimalAggregates extends Rule[LogicalPlan] { */ object ConvertToLocalRelation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Project(projectList, LocalRelation(output, data)) => +case p @ Project(projectList, LocalRelation(output, data)) +if !p.expressions.exists(hasUnevaluableExpr) => --- End diff -- `p.expressions` is just the `projectList` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65455758 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends Rule[LogicalPlan] { */ object ConvertToLocalRelation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Project(projectList, LocalRelation(output, data)) => +case p @ Project(projectList, LocalRelation(output, data)) +if !p.expressions.exists(ScalarSubquery.hasScalarSubquery) => --- End diff -- +1 to catch `Unevaluable` and special case `AttributeReference` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65441184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends Rule[LogicalPlan] { */ object ConvertToLocalRelation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Project(projectList, LocalRelation(output, data)) => +case p @ Project(projectList, LocalRelation(output, data)) +if !p.expressions.exists(ScalarSubquery.hasScalarSubquery) => --- End diff -- I think AttributeReference is the only exception, it will be replaced to BoundReference when create an Projection, we could have a special case for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65438111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -84,6 +84,13 @@ object ScalarSubquery { case _ => false }.isDefined } + + def hasScalarSubquery(e: Expression): Boolean = { +e.find { --- End diff -- @rxin Thank you for the review. I aligned my code to the existing implementation. But I can replace the method call with your suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65437967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends Rule[LogicalPlan] { */ object ConvertToLocalRelation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case Project(projectList, LocalRelation(output, data)) => +case p @ Project(projectList, LocalRelation(output, data)) +if !p.expressions.exists(ScalarSubquery.hasScalarSubquery) => --- End diff -- @davies Sorry for the delay in replying. I am new to the Spark code. I've looked at Unevaluable expressions. My findings are that checking for Unevaluable expressions would be too general since a lot of expressions mix in this trait. For example, AttributeReference is one of them. If we explicitly check for Unevaluable expressions, a simple query of the form "select c1 from t1" would be regressed. Let me know I misunderstood your requirement. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org